# 10. Input and Output

In [1]:
import pandas as pd

### Table of Contents

1. Export DataFrame to CSV File
2. Import Excel File into pandas
3. Export Excel File from pandas


In [2]:
# csv with for New York stored online
url = "https://data.cityofnewyork.us/api/views/25th-nujf/rows.csv"
pd.read_csv(url) # Pandas is able to directly load data from web

Unnamed: 0,Year of Birth,Gender,Ethnicity,Child's First Name,Count,Rank
0,2011,FEMALE,HISPANIC,GERALDINE,13,75
1,2011,FEMALE,HISPANIC,GIA,21,67
2,2011,FEMALE,HISPANIC,GIANNA,49,42
3,2011,FEMALE,HISPANIC,GISELLE,38,51
4,2011,FEMALE,HISPANIC,GRACE,36,53
...,...,...,...,...,...,...
69209,2012,MALE,BLACK NON HISP,CAYDEN,19,52
69210,2013,FEMALE,WHITE NON HISPANIC,Margaret,25,67
69211,2013,FEMALE,WHITE NON HISPANIC,Tamar,10,82
69212,2013,FEMALE,WHITE NON HISPANIC,Amanda,13,79


In [3]:
baby_names = pd.read_csv(url)
baby_names.head()

Unnamed: 0,Year of Birth,Gender,Ethnicity,Child's First Name,Count,Rank
0,2011,FEMALE,HISPANIC,GERALDINE,13,75
1,2011,FEMALE,HISPANIC,GIA,21,67
2,2011,FEMALE,HISPANIC,GIANNA,49,42
3,2011,FEMALE,HISPANIC,GISELLE,38,51
4,2011,FEMALE,HISPANIC,GRACE,36,53


## 1. Export DataFrame to CSV File

- The `to_csv` method exports a **DataFrame** to a CSV file.
- Its first argument is the filename.
- By default, pandas will include the index. Set the `index` parameter to `False` to exclude the index.
- The `columns` parameter limits the exported columns.


In [4]:
baby_names.to_csv() # all methods to input data begin with 'to'
# as a csv file is nothing more than one long string, we get this output if we call method without arguments
# this is the input we want to write to csv file

",Year of Birth,Gender,Ethnicity,Child's First Name,Count,Rank\n0,2011,FEMALE,HISPANIC,GERALDINE,13,75\n1,2011,FEMALE,HISPANIC,GIA,21,67\n2,2011,FEMALE,HISPANIC,GIANNA,49,42\n3,2011,FEMALE,HISPANIC,GISELLE,38,51\n4,2011,FEMALE,HISPANIC,GRACE,36,53\n5,2011,FEMALE,HISPANIC,GUADALUPE,26,62\n6,2011,FEMALE,HISPANIC,HAILEY,126,8\n7,2011,FEMALE,HISPANIC,HALEY,14,74\n8,2011,FEMALE,HISPANIC,HANNAH,17,71\n9,2011,FEMALE,HISPANIC,HAYLEE,17,71\n10,2011,FEMALE,HISPANIC,HAYLEY,13,75\n11,2011,FEMALE,HISPANIC,HAZEL,10,78\n12,2011,FEMALE,HISPANIC,HEAVEN,15,73\n13,2011,FEMALE,HISPANIC,HEIDI,15,73\n14,2011,FEMALE,HISPANIC,HEIDY,16,72\n15,2011,FEMALE,HISPANIC,HELEN,13,75\n16,2011,FEMALE,HISPANIC,IMANI,11,77\n17,2011,FEMALE,HISPANIC,INGRID,11,77\n18,2011,FEMALE,HISPANIC,IRENE,11,77\n19,2011,FEMALE,HISPANIC,IRIS,10,78\n20,2011,FEMALE,HISPANIC,ISABEL,28,60\n21,2011,FEMALE,HISPANIC,ISABELA,10,78\n22,2011,FEMALE,HISPANIC,ISABELLA,331,1\n23,2011,FEMALE,HISPANIC,ISABELLE,18,70\n24,2011,FEMALE,HISPANIC,ISIS,13,75\

In [5]:
baby_names.to_csv("baby_names.csv") # write to csv file, by default index is included
baby_names_csv = pd.read_csv("baby_names.csv") # read in csv file
baby_names_csv.head() 

Unnamed: 0.1,Unnamed: 0,Year of Birth,Gender,Ethnicity,Child's First Name,Count,Rank
0,0,2011,FEMALE,HISPANIC,GERALDINE,13,75
1,1,2011,FEMALE,HISPANIC,GIA,21,67
2,2,2011,FEMALE,HISPANIC,GIANNA,49,42
3,3,2011,FEMALE,HISPANIC,GISELLE,38,51
4,4,2011,FEMALE,HISPANIC,GRACE,36,53


In [6]:
baby_names.to_csv("baby_names_2.csv", index = False) # write to csv file, and set index to False to omit default index
baby_names_csv_2 = pd.read_csv("baby_names_2.csv") # read in csv file
baby_names_csv_2.head() 

Unnamed: 0,Year of Birth,Gender,Ethnicity,Child's First Name,Count,Rank
0,2011,FEMALE,HISPANIC,GERALDINE,13,75
1,2011,FEMALE,HISPANIC,GIA,21,67
2,2011,FEMALE,HISPANIC,GIANNA,49,42
3,2011,FEMALE,HISPANIC,GISELLE,38,51
4,2011,FEMALE,HISPANIC,GRACE,36,53


In [7]:
baby_names.to_csv("baby_names_3.csv", index = False, columns=("Year of Birth","Child's First Name","Count")) # to include only certain columns
baby_names_csv_3 = pd.read_csv("baby_names_3.csv") # read in csv file
baby_names_csv_3.head() 

Unnamed: 0,Year of Birth,Child's First Name,Count
0,2011,GERALDINE,13
1,2011,GIA,21
2,2011,GIANNA,49
3,2011,GISELLE,38
4,2011,GRACE,36


## 2. Import Excel File into pandas

- The `read_excel` function reads an Excel file/workbook into a **DataFrame**.
- Use the `sheet_name` parameter if the workbook contains multiple worksheets. Pass a single worksheet name or a list of worksheet names/index positions.
- Pass the `sheet_name` parameter an argument of `None` to include all worksheets.
- Pandas will store multiple worksheets in a Python dictionary. The keys will be the worksheet names, and the values will be the **DataFrames**.


In [8]:
# excel file with a single worksheet
pd.read_excel("data_single_worksheet.xlsx")

Unnamed: 0,First Name,Last Name,City,Gender
0,Brandon,James,Miami,M
1,Sean,Hawkins,Denver,M
2,Judy,Day,Los Angeles,F
3,Ashley,Ruiz,San Francisco,F
4,Stephanie,Gomez,Portland,F


In [9]:
# excel file with a multiple worksheet
pd.read_excel("data_multiple_worksheets.xlsx")

Unnamed: 0,First Name,Last Name,City,Gender
0,Brandon,James,Miami,M
1,Sean,Hawkins,Denver,M
2,Judy,Day,Los Angeles,F
3,Ashley,Ruiz,San Francisco,F
4,Stephanie,Gomez,Portland,F


In [10]:
# no error message, BUT Pandas only imports FIRST worksheet by default
pd.read_excel("data_multiple_worksheets.xlsx", sheet_name = "Data 1") # we can specify which worksheet we want to read by name
pd.read_excel("data_multiple_worksheets.xlsx", sheet_name = 0) # or by index
pd.read_excel("data_multiple_worksheets.xlsx", sheet_name = "Data 2") 

Unnamed: 0,First Name,Last Name,City,Gender
0,Parker,Power,Raleigh,F
1,Preston,Prescott,Philadelphia,F
2,Ronaldo,Donaldo,Bangor,M
3,Megan,Stiller,San Francisco,M
4,Bustin,Jieber,Austin,F


In [11]:
# however, how can we read both worksheets?
pd.read_excel("data_multiple_worksheets.xlsx", sheet_name = ["Data 1","Data 2"]) # we can pass in a list
pd.read_excel("data_multiple_worksheets.xlsx", sheet_name = [0,1])
# we see a sting representation of DataFrame, a dict

{0:   First Name Last Name           City Gender
 0    Brandon     James          Miami      M
 1       Sean   Hawkins         Denver      M
 2       Judy       Day    Los Angeles      F
 3     Ashley      Ruiz  San Francisco      F
 4  Stephanie     Gomez       Portland      F,
 1:   First Name Last Name           City Gender
 0     Parker     Power        Raleigh      F
 1    Preston  Prescott   Philadelphia      F
 2    Ronaldo   Donaldo         Bangor      M
 3      Megan   Stiller  San Francisco      M
 4     Bustin    Jieber         Austin      F}

In [12]:
# However: for both of these options we need to know how many sheets are present, or even their names
# easier way to import all worksheets without further specification : use 'None' as value for sheet_name parameter
pd.read_excel("data_multiple_worksheets.xlsx", sheet_name = None)

{'Data 1':   First Name Last Name           City Gender
 0    Brandon     James          Miami      M
 1       Sean   Hawkins         Denver      M
 2       Judy       Day    Los Angeles      F
 3     Ashley      Ruiz  San Francisco      F
 4  Stephanie     Gomez       Portland      F,
 'Data 2':   First Name Last Name           City Gender
 0     Parker     Power        Raleigh      F
 1    Preston  Prescott   Philadelphia      F
 2    Ronaldo   Donaldo         Bangor      M
 3      Megan   Stiller  San Francisco      M
 4     Bustin    Jieber         Austin      F,
 'Data 3':   First Name  Last Name     City Gender
 0     Robert     Miller  Seattle      M
 1       Tara     Garcia  Phoenix      F
 2    Raphael  Rodriguez  Orlando      M}

In [13]:
data = pd.read_excel("data_multiple_worksheets.xlsx", sheet_name = None)
# to get full DataFrame object
data["Data 1"]

Unnamed: 0,First Name,Last Name,City,Gender
0,Brandon,James,Miami,M
1,Sean,Hawkins,Denver,M
2,Judy,Day,Los Angeles,F
3,Ashley,Ruiz,San Francisco,F
4,Stephanie,Gomez,Portland,F


In [14]:
data["Data 2"]

Unnamed: 0,First Name,Last Name,City,Gender
0,Parker,Power,Raleigh,F
1,Preston,Prescott,Philadelphia,F
2,Ronaldo,Donaldo,Bangor,M
3,Megan,Stiller,San Francisco,M
4,Bustin,Jieber,Austin,F


In [15]:
# if you want to combine both sheets into 1 Df, you can use Pandas methods

## 3. Export Excel File from pandas

- The `ExcelWriter` class writes one or more **DataFrames** to an Excel file.
- Use a context manager (the `with` keyword) in combination with the `ExcelWriter` object and an assigned variable.
- Invoke the `to_excel` method on every **DataFrame** to include in the Excel workbook and pass in the `ExcelWriter` object as the first argument.
- The `to_excel` method supports `sheet_name`, `index`, and `columns` parameters.


In [16]:
url = "https://data.cityofnewyork.us/api/views/25th-nujf/rows.csv"
baby_names = pd.read_csv(url)
baby_names.head()

Unnamed: 0,Year of Birth,Gender,Ethnicity,Child's First Name,Count,Rank
0,2011,FEMALE,HISPANIC,GERALDINE,13,75
1,2011,FEMALE,HISPANIC,GIA,21,67
2,2011,FEMALE,HISPANIC,GIANNA,49,42
3,2011,FEMALE,HISPANIC,GISELLE,38,51
4,2011,FEMALE,HISPANIC,GRACE,36,53


In [17]:
# we want to save this dataset to two excel worksheets: one for male, other female

# begin by crearing 2 Dfs
females = baby_names[baby_names["Gender"] == "FEMALE"]
males = baby_names[baby_names["Gender"] == "MALE"]

In [18]:
# instantiate object of Class ExcelWriter to write an Excel file
with pd.ExcelWriter("NYC Baby Data.xlsx") as name_for_excel_file: # takes as argument name of excel file
    # context manager 'with' allows us to do a sequence of operations with the same object in an indented block
    # just like an indented block in a function
    # so, in this case, do a whole sequence of operations on the Excel Object we created by calling the ExcelWriter() class
    # this excel file we will name with name we passed in 'as' statement within this indented block
    females.to_excel(name_for_excel_file,sheet_name="Females",index=False)
    males.to_excel(name_for_excel_file,sheet_name="Males",index=False, columns=["Year of Birth","Child's First Name","Rank"])
    # the with statement makes sure that file gets closed once the 2 
