There are two main classes in `Pandas`:
* **DataFrame**: This is equivalent to a table, which includes having more than one column
* **Series**: This is usually only contains one column, and looks different from a DataFrame

In [2]:
import pandas as pd


data1 = pd.DataFrame({
    "Yes" : [50,21],
    "No" : [131, 2]
})

data1

Unnamed: 0,Yes,No
0,50,131
1,21,2


There is many ways to crate dataFrames in `Pandas`, one of them is using a dictionary. You can also use a string to store instead of a list

In [3]:
data2 = pd.DataFrame({
    'Bob' : ["I liked it", "It was aweful"],
    "Sue" : ["Pretty good", "Bland"]
})

data2

Unnamed: 0,Bob,Sue
0,I liked it,Pretty good
1,It was aweful,Bland


As you can see, for each row, it's associated with an incrementing integer. We can change that by using an optional paramter called `index`. `index` takes an iteratable object, like lists and tuples.

In [5]:
data2 = pd.DataFrame({
    'Bob' : ["I liked it", "It was aweful"],
    "Sue" : ["Pretty good", "Bland"]
}, index=("Product A", "Product B"))
data2

Unnamed: 0,Bob,Sue
Product A,I liked it,Pretty good
Product B,It was aweful,Bland


Now, we introduce `Series`, which is basically a sequence of data values. If a DataFrame is a table, a Series is a list. In fact, you can create one with just a list.

In [6]:
pd.Series([1,2,3,4,5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

`Series` do not have column names, they just have names. Also, `Series` is essentially one column, and you can, just like with `DataFrame`'s, change row labels.

In [8]:
pd.Series([30,40,50], index=["2015 Sales", "2016 Sales", "2017 Sales"], name='Product A')

2015 Sales    30
2016 Sales    40
2017 Sales    50
Name: Product A, dtype: int64

Creating a `DataFrame` from scratch might come in handy, but requires so much work. Which is why we can just import `.csv` files that already come with organized data.

### What is a CSV File?

A **CSV (Comma-Separated Values)** file is a simple text file that stores tabular data (like rows and columns in a spreadsheet). Each line in the file represents a row of data, and values within a row are separated by commas.

CSV files are commonly used to transfer data between different programs, such as Excel, databases, or data analysis tools like Python.

#### Example of a CSV File as a Table

| Name    | Age | City        |
|---------|-----|-------------|
| Alice   | 30  | New York    |
| Bob     | 25  | Los Angeles |
| Charlie | 22  | Chicago     |


In [11]:
data = pd.read_csv("./Data/winemag-data-130k-v2.csv")
data.shape

(129971, 14)

As you can see, we grabbed a file and used `.read_csv()` to conver it into a `DataFrame`. We can use `.head()` to see the first 5 entries.

In [12]:
data.head()

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


As you can see, that `.csv` file already has a bult-in index, so we can use that instead of having `Pandas` create a new one from scratch.

In [17]:
data = pd.read_csv("./Data/winemag-data-130k-v2.csv", index_col=0)
data.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


`index_col = 0` basically says take the first column as the new row label.

Now, come the excercise section.

In the cell below, create a DataFrame `fruits` that looks like this:

![alt](Data/image1.png)

In [23]:
import pandas as pd

fruits = pd.DataFrame({
    "Apples" : [30],
    "Bananas" : [21]
})
fruits

Unnamed: 0,Apples,Bananas
0,30,21


For your next excercise, create a dataframe `fruit_sales` that matches the diagram below:

![alt](Data/image2.png)

In [24]:
fruit_sales = pd.DataFrame({
    "Apples" : [35,41],
    "Bananas" : [21,34]
},
    index=["2017 Sales", "2018 Sales"])
fruit_sales

Unnamed: 0,Apples,Bananas
2017 Sales,35,21
2018 Sales,41,34


Create a variable `ingredients` with a Series that looks like:

```
Flour     4 cups
Milk       1 cup
Eggs     2 large
Spam       1 can
Name: Dinner, dtype: object
```

In [25]:
ingredients = pd.Series(["4 cups", "1 cup", "2 large", "1 can"],
               index=["Flour", "Milk", "Eggs", "Spam"],
               name="Dinner")
ingredients

Flour     4 cups
Milk       1 cup
Eggs     2 large
Spam       1 can
Name: Dinner, dtype: object

Read the following csv dataset of wine reviews into a DataFrame called `reviews`:

![alt](Data/image3.png)

The filepath to the csv file is `Data/wine-reviews/winemag-data_first150k.csv`. 


In [29]:
import pandas as pd

reviews = pd.read_csv("Data/winemag-data_first150k.csv", index_col=0)
reviews.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
0,US,This tremendous 100% varietal wine hails from ...,Martha's Vineyard,96,235.0,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz
1,Spain,"Ripe aromas of fig, blackberry and cassis are ...",Carodorum Selección Especial Reserva,96,110.0,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez
2,US,Mac Watson honors the memory of a wine once ma...,Special Selected Late Harvest,96,90.0,California,Knights Valley,Sonoma,Sauvignon Blanc,Macauley
3,US,"This spent 20 months in 30% new French oak, an...",Reserve,96,65.0,Oregon,Willamette Valley,Willamette Valley,Pinot Noir,Ponzi
4,France,"This is the top wine from La Bégude, named aft...",La Brûlade,95,66.0,Provence,Bandol,,Provence red blend,Domaine de la Bégude


Run the cell below to create and display a DataFrame called `animals`:

In [30]:
animals = pd.DataFrame({'Cows': [12, 20], 'Goats': [22, 19]}, index=['Year 1', 'Year 2'])
animals

Unnamed: 0,Cows,Goats
Year 1,12,22
Year 2,20,19


In the cell below, write code to save this DataFrame to disk as a csv file with the name `cows_and_goats.csv`.

In [None]:
animals.to_csv("Data/cows_and_goats.csv", index=False) # index = False basically removes the row label index. the defaut is they keep it