# Input and Output

In [1]:
import pandas as pd

## Export DataFrame to CSV File
- The `to_csv` method exports a **DataFrame** to a CSV file.
- Its first argument is the filename.
- By default, pandas will include the index. Set the `index` parameter to False to exclude the index.
- The `columns` parameter limits the exported columns.

In [5]:
url= 'https://data.cityofnewyork.us/api/views/25th-nujf/rows.csv'
baby_names= pd.read_csv(url) # pandas will automatically fetch the dataset available on the url above (I think this would be the equivalent of a FTP/SFTP connection)

# and the interesting about it is that if the dataset was updatable online, we would receive all the updates every time we connected it
baby_names.head()

Unnamed: 0,Year of Birth,Gender,Ethnicity,Child's First Name,Count,Rank
0,2011,FEMALE,HISPANIC,GERALDINE,13,75
1,2011,FEMALE,HISPANIC,GIA,21,67
2,2011,FEMALE,HISPANIC,GIANNA,49,42
3,2011,FEMALE,HISPANIC,GISELLE,38,51
4,2011,FEMALE,HISPANIC,GRACE,36,53


In [9]:
baby_names.to_csv('baby_names.csv')
baby_names.to_csv('baby_names.csv', index= False)
baby_names.to_csv('baby_names.csv', index= False, columns=['Year of Birth', 'Gender', "Child's First Name"])

## Install openpyxl Library to Read and Write Excel Files

In [12]:
import openpyxl
import pandas as pd

## Import Excel File into pandas
- The `read_excel` function reads an Excel file/workbook into a **DataFrame**.
- Use the `sheet_name` parameter if the workbook contains multiple worksheets. Pass a single worksheet name or a list of worksheet names/index positions.
- Pass the `sheet_name` parameter an argument of **None** to include all worksheets.
- Pandas will store multiple worksheets in a Python dictionary. The keys will be the worksheet names, and the values will be the **DataFrames**.

In [14]:
pd.read_excel("Data - Single Worksheet.xlsx")

Unnamed: 0,First Name,Last Name,City,Gender
0,Brandon,James,Miami,M
1,Sean,Hawkins,Denver,M
2,Judy,Day,Los Angeles,F
3,Ashley,Ruiz,San Francisco,F
4,Stephanie,Gomez,Portland,F


In [35]:
t= {1: 'teste', 2: 'testando'}
zip(t.keys(), t.values())

<zip at 0x21ad30bcbc0>

In [None]:
pd.read_excel("Data - Multiple Worksheets.xlsx") # pandas, by default, only import the very first worksheet
pd.read_excel("Data - Multiple Worksheets.xlsx", sheet_name= 'Data 1')
pd.read_excel("Data - Multiple Worksheets.xlsx", sheet_name= 'Data 2')

# we can also pass the sheets index position
pd.read_excel("Data - Multiple Worksheets.xlsx", sheet_name= 0)
pd.read_excel("Data - Multiple Worksheets.xlsx", sheet_name= 1)

# if we want to import more than one sheet...
pd.read_excel("Data - Multiple Worksheets.xlsx", sheet_name= [0,1])
pd.read_excel("Data - Multiple Worksheets.xlsx", sheet_name= ['Data 1', 'Data 2'])
data= pd.read_excel("Data - Multiple Worksheets.xlsx", sheet_name= None) # we're telling python to not select any specific sheet, but capture all of them instead

data 

{'Data 1':   First Name Last Name           City Gender
 0    Brandon     James          Miami      M
 1       Sean   Hawkins         Denver      M
 2       Judy       Day    Los Angeles      F
 3     Ashley      Ruiz  San Francisco      F
 4  Stephanie     Gomez       Portland      F,
 'Data 2':   First Name Last Name           City Gender
 0     Parker     Power        Raleigh      F
 1    Preston  Prescott   Philadelphia      F
 2    Ronaldo   Donaldo         Bangor      M
 3      Megan   Stiller  San Francisco      M
 4     Bustin    Jieber         Austin      F}

- The result of the previous code might be a dictionary, but what we see is only the string representation of it. Behind the scenes, each item of it (that means, each "value") is an official pandas data frame, what we can verify by simply accessing them

In [55]:
display(data['Data 1'])
print()
display(data['Data 2'])

Unnamed: 0,First Name,Last Name,City,Gender
0,Brandon,James,Miami,M
1,Sean,Hawkins,Denver,M
2,Judy,Day,Los Angeles,F
3,Ashley,Ruiz,San Francisco,F
4,Stephanie,Gomez,Portland,F





Unnamed: 0,First Name,Last Name,City,Gender
0,Parker,Power,Raleigh,F
1,Preston,Prescott,Philadelphia,F
2,Ronaldo,Donaldo,Bangor,M
3,Megan,Stiller,San Francisco,M
4,Bustin,Jieber,Austin,F


In [52]:
# what we
pd.concat([data['Data 1'], data['Data 2']]).reset_index().drop(columns='index')

Unnamed: 0,First Name,Last Name,City,Gender
0,Brandon,James,Miami,M
1,Sean,Hawkins,Denver,M
2,Judy,Day,Los Angeles,F
3,Ashley,Ruiz,San Francisco,F
4,Stephanie,Gomez,Portland,F
5,Parker,Power,Raleigh,F
6,Preston,Prescott,Philadelphia,F
7,Ronaldo,Donaldo,Bangor,M
8,Megan,Stiller,San Francisco,M
9,Bustin,Jieber,Austin,F


## Export Excel File from pandas
- The **ExcelWriter** class writes one or more **DataFrames** to an Excel file.
- Use a context manager (the `with` keyword) in combination with the **ExcelWriter** object and an assigned variable.
- Invoke the `to_excel` method on every **DataFrame** to include in the Excel workbook and pass in the **ExcelWriter** object as the first argument.
- The `to_excel` method supports `sheet_name`, `index`, and `columns` parameters.

In [56]:
url= 'https://data.cityofnewyork.us/api/views/25th-nujf/rows.csv'
baby_names= pd.read_csv(url) 
baby_names.head()

Unnamed: 0,Year of Birth,Gender,Ethnicity,Child's First Name,Count,Rank
0,2011,FEMALE,HISPANIC,GERALDINE,13,75
1,2011,FEMALE,HISPANIC,GIA,21,67
2,2011,FEMALE,HISPANIC,GIANNA,49,42
3,2011,FEMALE,HISPANIC,GISELLE,38,51
4,2011,FEMALE,HISPANIC,GRACE,36,53


In [59]:
females= baby_names[baby_names['Gender'].apply(lambda value: value.lower()) == 'female']
males= baby_names[baby_names['Gender'].apply(lambda value: value.lower()) == 'male']

In [60]:
with pd.ExcelWriter('NYC Baby Data.xlsx') as excel_file: # granting a name to ExcelWriter object
    females.to_excel(excel_file, sheet_name= 'BabyGirls', index= False)
    males.to_excel(excel_file, sheet_name='BabyBoys', index= False, columns=['Year of Birth', "Child's First Name", 'Rank'])

# one of the advantages of using the 'with' clause is that we make sure the excel object (and consequently, the final file) is properly finished after all the operations. This helps us avoid any problems with later memory / performance / data corruption 