# [Creating, reading, and writing workbook](https://www.kaggle.com/residentmario/creating-reading-and-writing-workbook)

## Introduction and relevant resources

This is the first notebook in the [Learn Pandas](https://www.kaggle.com/learn/pandas) track.  
These exercises assume some prior experience with Pandas.  
Each page has a list of relevant resources that you can use for reference, and the top item in each list has been chosen specifically to help you with the exercises on that page.  
The first step in most data science projects is reading in the data.  
In this section, you will be using `pandas` to create `Series` and `DataFrame` objects, both by hand and by reading data files.  
The Relevant Resources, as promised:  
* **[Creating, Reading and Writing Reference](https://www.kaggle.com/residentmario/creating-reading-and-writing-reference)**
* **[General Pandas Cheat Sheet](https://assets.datacamp.com/blog_assets/PandasPythonForDataScience.pdf)**

## Setup

In [72]:
import pandas as pd
pd.set_option('max_rows', 5)
from learntools.advanced_pandas.creating_reading_writing import *

### Checking Answers

You can check your answers in each of the exercises that follow using the `check_qN` function provided in the code cell above by replacing `N` with the number of the exercise.  
For example here's how you would check an incorrect answer to exercise 1:

In [73]:
check_q1(pd.DataFrame())

False

A correct answer would return `True`.  
If you capitulate, run `print(answer_qN()))`.

## Exercises

**Exercise 1**  
Create a `DataFrame`:

In [74]:
data = {'Apples': [30], 'Bananas': [21]}
pd.DataFrame(data=data)

Unnamed: 0,Apples,Bananas
0,30,21


In [75]:
df2 = pd.DataFrame([(30, 21)], columns=['Apples', 'Bananas'])
df2

Unnamed: 0,Apples,Bananas
0,30,21


In [76]:
answer_q1()

pd.DataFrame({'Apples': [30], 'Bananas': [21]})


**Exercise 2**  
Create a 2x2 DataFrame:

In [77]:
df2x2 = pd.DataFrame([[35, 21], [41, 34]], index=['2017 Sales', '2018 Sales'], columns=['Apples', 'Bananas'])
df2x2

Unnamed: 0,Apples,Bananas
2017 Sales,35,21
2018 Sales,41,34


In [78]:
answer_q2()

pd.DataFrame(
    {'Apples': [35, 41], 'Bananas': [21, 34]},
    index=['2017 Sales', '2018 Sales']
)


**Exercise 3**  
Create a `Series`:

In [79]:
pd.Series({'Flour': '4 cups', 'Milk': '1 cup', 'Eggs': '2 large', 'Spam': '1 can'}, name='Dinner')

Eggs     2 large
Flour     4 cups
Milk       1 cup
Spam       1 can
Name: Dinner, dtype: object

In [80]:
answer_q3()

pd.Series(['4 cups', '1 cup', '2 large', '1 can'], 
index=['Flour', 'Milk', 'Eggs', 'Spam'], 
name='Dinner')


**Exercise 4**  
Read data from a .csv file into a `DataFrame`.

In [81]:
wine_reviews = pd.read_csv('inputs/wine-reviews/winemag-data_first150k.csv', index_col=0)
wine_reviews.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
0,US,This tremendous 100% varietal wine hails from ...,Martha's Vineyard,96,235.0,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz
1,Spain,"Ripe aromas of fig, blackberry and cassis are ...",Carodorum Selección Especial Reserva,96,110.0,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez
2,US,Mac Watson honors the memory of a wine once ma...,Special Selected Late Harvest,96,90.0,California,Knights Valley,Sonoma,Sauvignon Blanc,Macauley
3,US,"This spent 20 months in 30% new French oak, an...",Reserve,96,65.0,Oregon,Willamette Valley,Willamette Valley,Pinot Noir,Ponzi
4,France,"This is the top wine from La Bégude, named aft...",La Brûlade,95,66.0,Provence,Bandol,,Provence red blend,Domaine de la Bégude


In [82]:
wine_reviews.tail()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
150925,Italy,Many people feel Fiano represents southern Ita...,,91,20.0,Southern Italy,Fiano di Avellino,,White Blend,Feudi di San Gregorio
150926,France,"Offers an intriguing nose with ginger, lime an...",Cuvée Prestige,91,27.0,Champagne,Champagne,,Champagne Blend,H.Germain
150927,Italy,This classic example comes from a cru vineyard...,Terre di Dora,91,20.0,Southern Italy,Fiano di Avellino,,White Blend,Terredora
150928,France,"A perfect salmon shade, with scents of peaches...",Grand Brut Rosé,90,52.0,Champagne,Champagne,,Champagne Blend,Gosset
150929,Italy,More Pinot Grigios should taste like this. A r...,,90,15.0,Northeastern Italy,Alto Adige,,Pinot Grigio,Alois Lageder


In [83]:
wine_reviews.shape

(150930, 10)

In [84]:
wine_reviews.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 150930 entries, 0 to 150929
Data columns (total 10 columns):
country        150925 non-null object
description    150930 non-null object
designation    105195 non-null object
points         150930 non-null int64
price          137235 non-null float64
province       150925 non-null object
region_1       125870 non-null object
region_2       60953 non-null object
variety        150930 non-null object
winery         150930 non-null object
dtypes: float64(1), int64(1), object(8)
memory usage: 12.7+ MB


In [85]:
dir(wine_reviews)

['T',
 '_AXIS_ALIASES',
 '_AXIS_IALIASES',
 '_AXIS_LEN',
 '_AXIS_NAMES',
 '_AXIS_NUMBERS',
 '_AXIS_ORDERS',
 '_AXIS_REVERSED',
 '_AXIS_SLICEMAP',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_wrap__',
 '__bool__',
 '__bytes__',
 '__class__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__dir__',
 '__div__',
 '__doc__',
 '__eq__',
 '__finalize__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattr__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__invert__',
 '__ipow__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__module__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__nonzero__',
 '__or__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdiv__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rfloordiv__',
 '__rmod__',
 '__rmul__',
 '__ror__',
 '__round__',
 '__rpow__',
 '__

In [86]:
print(wine_reviews)

       country                                        description  \
0           US  This tremendous 100% varietal wine hails from ...   
1        Spain  Ripe aromas of fig, blackberry and cassis are ...   
...        ...                                                ...   
150928  France  A perfect salmon shade, with scents of peaches...   
150929   Italy  More Pinot Grigios should taste like this. A r...   

                                 designation  points  price  \
0                          Martha's Vineyard      96  235.0   
1       Carodorum Selección Especial Reserva      96  110.0   
...                                      ...     ...    ...   
150928                       Grand Brut Rosé      90   52.0   
150929                                   NaN      90   15.0   

                  province     region_1 region_2             variety  \
0               California  Napa Valley     Napa  Cabernet Sauvignon   
1           Northern Spain         Toro      NaN       Tinta d

In [87]:
wine_reviews.items

<bound method DataFrame.iteritems of        country                                        description  \
0           US  This tremendous 100% varietal wine hails from ...   
1        Spain  Ripe aromas of fig, blackberry and cassis are ...   
...        ...                                                ...   
150928  France  A perfect salmon shade, with scents of peaches...   
150929   Italy  More Pinot Grigios should taste like this. A r...   

                                 designation  points  price  \
0                          Martha's Vineyard      96  235.0   
1       Carodorum Selección Especial Reserva      96  110.0   
...                                      ...     ...    ...   
150928                       Grand Brut Rosé      90   52.0   
150929                                   NaN      90   15.0   

                  province     region_1 region_2             variety  \
0               California  Napa Valley     Napa  Cabernet Sauvignon   
1           Northern Spai

In [88]:
answer_q4()

pd.read_csv("../input/wine-reviews/winemag-data_first150k.csv", index_col=0)


**Exercise 5**  
Read data from a .xls sheet into a pandas `DataFrame`.

In [89]:
wic = pd.read_excel('inputs/publicassistance/xls_files_all/WICAgencies2014ytd.xls',
                    sheetname='Pregnant Women Participating')
wic.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 106 entries, 0 to 105
Data columns (total 14 columns):
WIC PROGRAM -- NUMBER OF PREGNANT WOMEN PARTICIPATING    103 non-null object
Unnamed: 1                                               99 non-null object
Unnamed: 2                                               99 non-null object
Unnamed: 3                                               99 non-null object
Unnamed: 4                                               99 non-null object
Unnamed: 5                                               99 non-null object
Unnamed: 6                                               99 non-null object
Unnamed: 7                                               99 non-null object
Unnamed: 8                                               99 non-null object
Unnamed: 9                                               99 non-null object
Unnamed: 10                                              99 non-null object
Unnamed: 11                                              9

In [90]:
wic.head()

Unnamed: 0,WIC PROGRAM -- NUMBER OF PREGNANT WOMEN PARTICIPATING,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13
0,FISCAL YEAR 2014,,,,,,,,,,,,,
1,"Data as of January 05, 2018",,,,,,,,,,,,,
2,,,,,,,,,,,,,,
3,State Agency or Indian Tribal Organization,2013-10-01 00:00:00,2013-11-01 00:00:00,2013-12-01 00:00:00,2014-01-01 00:00:00,2014-02-01 00:00:00,2014-03-01 00:00:00,2014-04-01 00:00:00,2014-05-01 00:00:00,2014-06-01 00:00:00,2014-07-01 00:00:00,2014-08-01 00:00:00,2014-09-01 00:00:00,Average Participation
4,Connecticut,5847,5476,5274,5360,5056,5319,5500,5717,5703,5905,5754,5624,5544.58


In [91]:
answer_q5()

pd.read_excel("../input/publicassistance/xls_files_all/WICAgencies2014ytd.xls", 
sheetname='Pregnant Women Participating')


**Exercise 6**  
Save a `DataFrame` as a .csv file.