# Input and Output

In [2]:
import pandas as pd

<a href= https://data.cityofnewyork.us/api/views/25th-nujf/rows.csv><u>**URL for the dataset**</u></a> of this module

#### Connecting to a dynamic dataset -- via url
If there are changes to the source data, then the changes will reflect once the cell is run again/ or refreshed

In [6]:
# Reading direclty from a url
url = "https://data.cityofnewyork.us/api/views/25th-nujf/rows.csv"
nyc_names = pd.read_csv(url)
nyc_names.head()

Unnamed: 0,Year of Birth,Gender,Ethnicity,Child's First Name,Count,Rank
0,2011,FEMALE,HISPANIC,GERALDINE,13,75
1,2011,FEMALE,HISPANIC,GIA,21,67
2,2011,FEMALE,HISPANIC,GIANNA,49,42
3,2011,FEMALE,HISPANIC,GISELLE,38,51
4,2011,FEMALE,HISPANIC,GRACE,36,53


In [9]:
nyc_names.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 69214 entries, 0 to 69213
Data columns (total 6 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Year of Birth       69214 non-null  int64 
 1   Gender              69214 non-null  object
 2   Ethnicity           69214 non-null  object
 3   Child's First Name  69214 non-null  object
 4   Count               69214 non-null  int64 
 5   Rank                69214 non-null  int64 
dtypes: int64(3), object(3)
memory usage: 3.2+ MB


## Export DataFrame to CSV File
- The `to_csv` method exports a **DataFrame** to a CSV file.
- Its first argument is the filename.
- By default, pandas will include the index. Set the `index` parameter to False to exclude the index.
- The `columns` parameter limits the exported columns.

##### We're reading data from an excel file (.xlsx). Note that **pandas has a dependency on the `openpyxl` library** to handle read-write operations on these files. 
So ensure it is installed in the v-env

In [10]:
# Exports to a csv file with the specified name
nyc_names.to_csv("nyc_names.csv")

In [12]:
# Control if csv export should be with/ without the default indexes
nyc_names.to_csv("nyc_names_no_index.csv",index=False)

In [14]:
# Control which columns to include in the exported file
nyc_names.to_csv("nyc_names_select_columns.csv",index=False,columns=["Year of Birth","Gender","Ethnicity"])

In [16]:
# Reading from an xlsx file
pd.read_excel("Data - Single Worksheet.xlsx")

Unnamed: 0,First Name,Last Name,City,Gender
0,Brandon,James,Miami,M
1,Sean,Hawkins,Denver,M
2,Judy,Day,Los Angeles,F
3,Ashley,Ruiz,San Francisco,F
4,Stephanie,Gomez,Portland,F


In [21]:
# Reading from an excel file containing multiple worksheets. However note that only first worksheet is loaded
pd.read_excel("Data - Multiple Worksheets.xlsx")

Unnamed: 0,First Name,Last Name,City,Gender
0,Brandon,James,Miami,M
1,Sean,Hawkins,Denver,M
2,Judy,Day,Los Angeles,F
3,Ashley,Ruiz,San Francisco,F
4,Stephanie,Gomez,Portland,F


In [24]:
# We can pass the name of the worksheets to load 
pd.read_excel("Data - Multiple Worksheets.xlsx",sheet_name="Data 1")
pd.read_excel("Data - Multiple Worksheets.xlsx",sheet_name="Data 2")

Unnamed: 0,First Name,Last Name,City,Gender
0,Parker,Power,Raleigh,F
1,Preston,Prescott,Philadelphia,F
2,Ronaldo,Donaldo,Bangor,M
3,Megan,Stiller,San Francisco,M
4,Bustin,Jieber,Austin,F


##### The `sheet_name` parameter can also contain a list of multiple worksheets but **note that in this case pandas renders the output as a dict**. The individual worksheet name becomes the key and its respective content -- while it is a dataframe in the backend -- becomes the value part of the repsective key. 
##### So, each key of the dict holds the dataframe assiciated with that worksheet name -- its just the manner in which the content is displayed is just a string representation.
##### Calling each key of the dictionary will output the respective dataframe.

In [None]:
pd.read_excel("Data - Multiple Worksheets.xlsx",sheet_name=["Data 1","Data 2"])

{'Data 1':   First Name Last Name           City Gender
 0    Brandon     James          Miami      M
 1       Sean   Hawkins         Denver      M
 2       Judy       Day    Los Angeles      F
 3     Ashley      Ruiz  San Francisco      F
 4  Stephanie     Gomez       Portland      F,
 'Data 2':   First Name Last Name           City Gender
 0     Parker     Power        Raleigh      F
 1    Preston  Prescott   Philadelphia      F
 2    Ronaldo   Donaldo         Bangor      M
 3      Megan   Stiller  San Francisco      M
 4     Bustin    Jieber         Austin      F}

In [26]:
type(pd.read_excel("Data - Multiple Worksheets.xlsx",sheet_name=["Data 1","Data 2"]))

dict

##### We can pass `None` argument to the `sheet_name` parameter to avoid mentioning each worksheet name & just load everything from the excel file-- but result's still a dict. 
This would be more convenient when we're dealing with a file containing multiple worksheets

In [28]:
data_load = pd.read_excel("Data - Multiple Worksheets.xlsx",sheet_name=None)
data_load

{'Data 1':   First Name Last Name           City Gender
 0    Brandon     James          Miami      M
 1       Sean   Hawkins         Denver      M
 2       Judy       Day    Los Angeles      F
 3     Ashley      Ruiz  San Francisco      F
 4  Stephanie     Gomez       Portland      F,
 'Data 2':   First Name Last Name           City Gender
 0     Parker     Power        Raleigh      F
 1    Preston  Prescott   Philadelphia      F
 2    Ronaldo   Donaldo         Bangor      M
 3      Megan   Stiller  San Francisco      M
 4     Bustin    Jieber         Austin      F}

In [29]:
type(data_load)

dict

In [32]:
# Each key holds the respective dataframe
data_load["Data 1"]
data_load["Data 2"]


Unnamed: 0,First Name,Last Name,City,Gender
0,Parker,Power,Raleigh,F
1,Preston,Prescott,Philadelphia,F
2,Ronaldo,Donaldo,Bangor,M
3,Megan,Stiller,San Francisco,M
4,Bustin,Jieber,Austin,F


##### In case we want all the worksheets to be loaded into a single dataframe, we join them using the `concat` function

## Install openpyxl Library to Read and Write Excel Files

## Import Excel File into pandas
- The `read_excel` function reads an Excel file/workbook into a **DataFrame**.
- Use the `sheet_name` parameter if the workbook contains multiple worksheets. Pass a single worksheet name or a list of worksheet names/index positions.
- Pass the `sheet_name` parameter an argument of **None** to include all worksheets.
- Pandas will store multiple worksheets in a Python dictionary. The keys will be the worksheet names, and the values will be the **DataFrames**.

In [36]:
pd.concat([data_load["Data 1"],data_load["Data 2"]])
pd.concat([data_load["Data 1"],data_load["Data 2"]],ignore_index=True)

Unnamed: 0,First Name,Last Name,City,Gender
0,Brandon,James,Miami,M
1,Sean,Hawkins,Denver,M
2,Judy,Day,Los Angeles,F
3,Ashley,Ruiz,San Francisco,F
4,Stephanie,Gomez,Portland,F
5,Parker,Power,Raleigh,F
6,Preston,Prescott,Philadelphia,F
7,Ronaldo,Donaldo,Bangor,M
8,Megan,Stiller,San Francisco,M
9,Bustin,Jieber,Austin,F


## Export Excel File from pandas
- The **ExcelWriter** class writes one or more **DataFrames** to an Excel file.
- Use a context manager (the `with` keyword) in combination with the **ExcelWriter** object and an assigned variable.
- Invoke the `to_excel` method on every **DataFrame** to include in the Excel workbook and pass in the **ExcelWriter** object as the first argument.
- The `to_excel` method supports `sheet_name`, `index`, and `columns` parameters.

In [43]:
# Filtering out all names assigned to females from the nyc_names dataset
nyc_names[nyc_names["Gender"]=="FEMALE"]
nyc_names_female = nyc_names[nyc_names["Gender"]=="FEMALE"] 
nyc_names_male = nyc_names[nyc_names["Gender"]=="MALE"]

- The original dateset (the excel from the URL) contains a list of both names containing both mmane & female genders
- We segregated based on genders & assigned them to different dataframes.
- We'll now export these dataframes to a New excel file with separate worksheets for both the dataframes (i.e. based on gender) 


In [45]:
# exporting each of the dataframes to separate excel worksheets in a single excel file -- using ExcelWriter
# We use the context manager to handle the creation of excel file & writing the dataframes to it
with pd.ExcelWriter("NYC_Names_Filtered.xlsx") as excel_file_f:
    nyc_names_female.to_excel(excel_file_f,sheet_name="Female Names",index=False,columns=["Year of Birth","Ethnicity","Child's First Name","Rank"])
    nyc_names_male.to_excel(excel_file_f,sheet_name="Male Names",index=False,columns=["Year of Birth","Ethnicity","Child's First Name","Rank"])

***End of this section***
____