# Pandas reading and Writing
The main purpose of this notebook to get you to look at the pandas documenation  
[pandas Docs](https://pandas.pydata.org/pandas-docs/stable/index.html)  

#### Import the pandas
and set the sample data directory and filename

In [None]:
import pandas as pd

In [None]:
# just check the directory and file are there
!dir "../data"

In [None]:
# up one levels to topic03 and then down into data
datadir = "../data/"
filename="people-100.csv"

#### Read in the csv file
<code>df = pd.read_csv(file)</code>

In [None]:
df= pd.read_csv(datadir+filename)

check what you got using 
<code>df.head()</code>
<code>df.info()</code>


In [None]:
df.head()

In [None]:
df.info()

### Reading in dates
mmmm! I would like the Date of birth to be a date object  
use
<code>pd.read_csv(file, parse_dates=[colnum])</code>

In [None]:
df = pd.read_csv(datadir+filename, parse_dates=['Date of birth'])
df.info()

### While I am here 
if you want to limit what columes are read in 
<code>header=[list of cols]</code>

In [None]:
names_of_columns=['First Name', 'Last Name','Email','Phone', 'Date of birth']
df = pd.read_csv(datadir+filename,  usecols=names_of_columns,)
df.head(3)

### If there is not header row and you want to make names
use 
<code>header</code> <code> names</code>
use 
<code>index_col</code> to specifify the index Column

In [None]:
filename=filename="people-100-no-header.csv"

In [None]:
df= pd.read_csv(datadir+filename)
df.head(3)

In [None]:
df= pd.read_csv(datadir+filename, header=None, index_col=0)
df.head(3)

In [None]:
names=['index','id','firstname','lastname','sex','email','phone','DOB','occupation']
df= pd.read_csv(datadir+filename, header=None,index_col=0,names=names, parse_dates=['DOB'])
df.head(3)
#df.info()

### Reading from https or even S3 buckets
you can read files directly from the cloud into a Data Frame

In [None]:
df = pd.read_csv("https://drive.google.com/uc?id=1zO8ekHWx9U7mrbx_0Hoxxu6od7uxJqWw&export=download")
df.head(3)

Or even an S3 bucket
I had to <code>pip install s3fs</code> on my machine for this to work

In [None]:
# It can take a while to find a file you want
# this file contains weather info for phoenix in 2020
remote_file = "s3://noaa-gsod-pds/2020/72278023183.csv"

df = pd.read_csv( remote_file, storage_options={"anon": True})
df.head(3)

## Writing data

if I do not know a data set I like to write it out to an Excel file so that I can inspect it

In [None]:
workbookFileName = 's3Data.xlsx'
df.to_excel(workbookFileName, sheet_name='phoenix', index=False)

More information on the PANDAS documenation
https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-excel-writer