<a href="https://colab.research.google.com/github/Daniel-Benson-Poe/66DaysOfData/blob/main/Pandas_MiniCourse_on_Kaggle.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Creating, Reading, and Writing Pandas

In [None]:
# We must first always import pandas
# We use pd as a shortcut se we don't have to keep typing out the whole word
import pandas as pd

###The DataFrame

A dataframe is a table containing an array of entries with certain values. Every entry corresponds to a row and a column.

In [None]:
pd.DataFrame(
    {"Yes": [50, 21], 
     "No" : [131, 2]}
)

Unnamed: 0,Yes,No
0,50,131
1,21,2


DataFrames can also contain strings as values instead

In [None]:
pd.DataFrame(
    {"Bob" : ["I liked it.", "It was awful."], 
     "Sue" : ["Pretty good.", "Bland."]}
)

Unnamed: 0,Bob,Sue
0,I liked it.,Pretty good.
1,It was awful.,Bland.


Declaration syntax for DataFrames is a dictionary having its keys as the column names and having values as the list of entries

The default when creating a DataBase is to have the row labels simply (0, 1, 2...) but we can specify how we want the rows labeled if we want

In [None]:
pd.DataFrame(
    {"Bob" : ["I liked it.", "It was awful."], 
     "Sue" : ["Pretty good.", "Bland."]},
     index=["Product A", "Product B"]
)

Unnamed: 0,Bob,Sue
Product A,I liked it.,Pretty good.
Product B,It was awful.,Bland.


###Series

A series is a sequence of data values. A Series is to a list as a DataFrame is to a table

In [None]:
pd.Series([1, 2, 3, 4, 5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

A Series is a single column of a DataFrame. This means we can assign row labels to the Series using an index but the Series does not have an actual column name, only its overall name.

In [None]:
pd.Series(
    [30, 35, 40],
    index=["2015 Sales", "2016 Sales", "2017 Sales"],
    name="Product A"
)

2015 Sales    30
2016 Sales    35
2017 Sales    40
Name: Product A, dtype: int64

A csv file is a file containing the raw data of some database. We can read in these csv files and immediately create a DataBase out of them.

In [None]:
wine_reviews = pd.read_csv("/content/winemag-data-130k-v2.csv")

In [None]:
# We use the shape attribute to check how large a DataFrame is
wine_reviews.shape

(129971, 14)

The shape above tells us that there are about 130,000 records split across 14 different columns

In [None]:
# We can exampine the contents of a DataFrame using head(); this will show us the first five rows of data
wine_reviews.head()

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


In [None]:
# We can specify in the read_csv function to use the 'country' column for the 
# index instead of creating a new one from scratch, which is usually the default
wine_reviews = pd.read_csv("/content/winemag-data-130k-v2.csv", index_col=0)
wine_reviews.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


## Your Turn

In [None]:
# Initial Setup
import pandas as pd
print("Setup complete.")

Setup complete.


In the cell below, create a DataFrame fruits that looks like this:

    |Apples  |  Bananas |
    |-------------------|
    | 30     |   21     |
    |-------------------|


In [None]:
fruits = pd.DataFrame(
    {"Apples" : [30],
     "Bananas" : [21]}
)
fruits

Unnamed: 0,Apples,Bananas
0,30,21


Create a dataframe fruit_sales that matches the diagram below:

````        
             | Apples  | Bananas |    
             |-------------------|    
 2017 Sales  | 35     |   21     |    
             |-------------------|    
 2018 Sales  | 41     |   34     |    
             |-------------------|    
````

In [None]:
fruit_sales = pd.DataFrame(
    {
        "Apples": [35, 41],
        "Bananas": [21, 34]
     },
     index=["2017 Sales", "2018 Sales"]
)

fruit_sales

Unnamed: 0,Apples,Bananas
2017 Sales,35,21
2018 Sales,41,34


Create a variable ingredients with a Series that looks like:

Flour     4 cups

Milk       1 cup

Eggs     2 large

Spam       1 can

Name: Dinner, dtype: object

In [None]:
ingredients = pd.Series(
    ["4 cups", "1 cup", "2 large", "1 can"],
    index = ["Flour", "Milk", "Eggs", "Spam"],
    name="Dinner"
)
ingredients

Flour     4 cups
Milk       1 cup
Eggs     2 large
Spam       1 can
Name: Dinner, dtype: object

Read the csv dataset of wine reviews into a DataFrame called reviews

In [None]:
reviews = pd.read_csv("/content/winemag-data-130k-v2.csv", index_col=0)
reviews.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


Run the cell below to create and display a DataFrame called animals:

In [None]:
animals = pd.DataFrame({'Cows': [12, 20], 'Goats': [22, 19]}, index=['Year 1', 'Year 2'])
animals

Unnamed: 0,Cows,Goats
Year 1,12,22
Year 2,20,19


In the cell below, write code to save this DataFrame to disk as a csv file with the name cows_and_goats.csv.

In [None]:
animals.to_csv('/content/cows_and_goats.csv')