In [34]:
import pandas as pd
import numpy as np

There are two core objects in pandas: the DataFrame and the Series.
A dataframe is a table that has arrays of individual entries, each of which has a certain value.
Each entry corresponds to a row or a record and a collumn

In [14]:
df1 = pd.DataFrame({'Yes' : [1, 2], 'No' : [40, np.nan]})
df1

Unnamed: 0,Yes,No
0,1,40.0
1,2,


In [15]:
df1 = pd.DataFrame({'Yes' : ['Haha'], 'No' : ['Hehe']})
df1

Unnamed: 0,Yes,No
0,Haha,Hehe


The standard way of constructing a new DataFrame is using a dictionary where the keys are the collumn names and the values are  a list of entries. The dictionary-list constructor assigns values to the column labels, but just uses an ascending count from 0 (0, 1, 2, 3, ...) for the row labels. 

To change this default labels, we can use a index parameter in

In [17]:
df2 = pd.DataFrame({'Part 1' : ['pt1', 'pt2'], 'Part 2' : ['pt3', 'pt4']}, index=['A', 'B'])
df2

Unnamed: 0,Part 1,Part 2
A,pt1,pt3
B,pt2,pt4


A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. And in fact you can create one with nothing more than a list

In [19]:
pd.Series(np.arange(1, 4))

0    1
1    2
2    3
dtype: int32

A Series is, in essence, *a single column of a DataFrame*. So you can assign column values to the Series the same way as before, using an index parameter. However, a Series does not have a column name, it only has one overall name

In [20]:
pd.Series([30, 50, 50], index=['2015', '2016', '2017'], name='product A')

2015    30
2016    50
2017    50
Name: product A, dtype: int64

The Series and the DataFrame are intimately related. It's helpful to think of a DataFrame as actually being just a bunch of Series "glued together". 

"Comma-Separated Values", or CSV.

In [21]:
dataset = pd.read_csv('Data Sources/winemag-data-130k-v2.csv')

In [22]:
dataset.head()

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


In [23]:
dataset.shape

(129971, 14)

The pd.read_csv() function is well-endowed, with over 30 optional parameters you can specify. For example, you can see in this dataset that the CSV file has a built-in index, which pandas did not pick up on automatically. To make pandas use that column for the index (instead of creating a new one from scratch), we can specify an index_col

In [28]:
dataset = pd.read_csv('Data Sources/winemag-data-130k-v2.csv', index_col = 0) 
# in index_col, specify the index from the csv that you want to use

In [29]:
dataset.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


# Exercise from kaggle

## Python Pandas - Options and Customization.

Pandas provide API to customize some aspects of its behavior, display is being mostly used.

The API is composed of five functions: get_option(), set_option(), reset_option(), describe_option(), option_context().

get_option(param): takes a single parameter and returns the value

display.max_rows: 
Interpreter reads this value and displays the rows with this value as upper limit to display.

set_option(param,value): set_option takes two arguments and sets the value to the parameter

reset_option(param): takes an argument and sets the value back to the default value.

describe_option(param): describe_option prints the description of the argument.

option_context(): context manager is used to set the option in with statement temporarily. Option values are restored automatically when you exit the with block


Some of the parameter are:

- 1 - display.max_rows
- 2 - display.max_columns
- 3 - display.expand_frame_repr (Displays DataFrames to Stretch Pages)
- 4 - display.max_colwidth (Displays maximum column width)
- 5 - display.precision (Displays precision for decimal numbers)


The **learntools** folder contains a python package that provides feedback to users in Kaggle Learn courses.

In [46]:
print (pd.get_option("display.max_rows"))
print (pd.get_option("display.max_columns"))
pd.set_option("display.max_rows",10)
print (pd.get_option("display.max_rows"))
pd.reset_option("display.max_rows")
print (pd.get_option("display.max_rows"))
pd.describe_option("display.max_rows")

with pd.option_context("display.max_rows", 5):
   print(pd.get_option("display.max_rows"))

print(pd.get_option("display.max_rows"))

60
20
10
60
display.max_rows : int
    If max_rows is exceeded, switch to truncate view. Depending on
    `large_repr`, objects are either centrally truncated or printed as
    a summary view. 'None' value means unlimited.

    In case python/IPython is running in a terminal and `large_repr`
    equals 'truncate' this can be set to 0 and pandas will auto-detect
    the height of the terminal and print a truncated object which fits
    the screen height. The IPython notebook, IPython qtconsole, or
    IDLE do not run in a terminal and hence it is not possible to do
    correct auto-detection.
    [default: 60] [currently: 60]


5
60


In [48]:
# pd.set_option('max_rows', 5)
# from learntools.core import binder; binder.bind(globals())
# from learntools.pandas.creating_reading_and_writing import *
# print("Setup complete.")

1. create a DataFrame fruits that looks like this:
![](https://i.imgur.com/Ax3pp2A.png)

In [52]:
pd.DataFrame({'Apples' : [20], 'Bananas' : [21]})

Unnamed: 0,Apples,Bananas
0,20,21


Create a dataframe `fruit_sales` that matches the diagram below:

![](https://i.imgur.com/CHPn7ZF.png)

In [55]:
fruit_sales = pd.DataFrame({'Apples' : [35, 41], 'Bananas' : [21, 34]}, index=['2017 Sales', '2018 Sales'])
fruit_sales

Unnamed: 0,Apples,Bananas
2017 Sales,35,21
2018 Sales,41,34


3. Create a variable `ingredients` with a Series that looks like:

```
Flour     4 cups
Milk       1 cup
Eggs     2 large
Spam       1 can
Name: Dinner, dtype: object
```

In [56]:
ingredients = pd.Series(['4 cups', '1 cup', '2 large', '1 can'], 
                        index=['Flour', 'Milk', 'Eggs', 'Spam'], 
                        name='Dinner')

ingredients

Flour     4 cups
Milk       1 cup
Eggs     2 large
Spam       1 can
Name: Dinner, dtype: object

## 4.

Read the following csv dataset of wine reviews into a DataFrame called `reviews`:

![](https://i.imgur.com/74RCZtU.png)

The filepath to the csv file is `../input/wine-reviews/winemag-data_first150k.csv`. The first few lines look like:

```
,country,description,designation,points,price,province,region_1,region_2,variety,winery
0,US,"This tremendous 100% varietal wine[...]",Martha's Vineyard,96,235.0,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz
1,Spain,"Ripe aromas of fig, blackberry and[...]",Carodorum Selección Especial Reserva,96,110.0,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez
```

In [60]:
reviews = pd.read_csv('Data Sources/winemag-data_first150k.csv', index_col = 0)
reviews

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
0,US,This tremendous 100% varietal wine hails from ...,Martha's Vineyard,96,235.0,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz
1,Spain,"Ripe aromas of fig, blackberry and cassis are ...",Carodorum Selección Especial Reserva,96,110.0,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez
...,...,...,...,...,...,...,...,...,...,...
150928,France,"A perfect salmon shade, with scents of peaches...",Grand Brut Rosé,90,52.0,Champagne,Champagne,,Champagne Blend,Gosset
150929,Italy,More Pinot Grigios should taste like this. A r...,,90,15.0,Northeastern Italy,Alto Adige,,Pinot Grigio,Alois Lageder


Run the cell below to create and display a DataFrame called `animals`:

In [61]:
animals = pd.DataFrame({'Cows': [12, 20], 'Goats': [22, 19]}, index=['Year 1', 'Year 2'])
animals

Unnamed: 0,Cows,Goats
Year 1,12,22
Year 2,20,19


In the cell below, write code to save this DataFrame to disk as a csv file with the name `cows_and_goats.csv`.

In [64]:
animals.to_csv('cows_and_goats.csv')