In [2]:
import pandas as pd

In [3]:
# Datafranes are like dictionaries where the keys are the column names, and the values are the list of entries

pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})

Unnamed: 0,Yes,No
0,50,131
1,21,2


In [4]:
# If we want to add row labels, we can add an 'index'

pd.DataFrame({'Bob' : ['I liked it.', 'It was awful'],
              'Sue' : ['Best movie ever!', 'It was okay'],
              'Joe' : ['I liked it.', 'It was awful']},
             index = ['Product A', 'Product B'])

Unnamed: 0,Bob,Sue,Joe
Product A,I liked it.,Best movie ever!,I liked it.
Product B,It was awful,It was okay,It was awful


In [5]:
## Series are different. They are a sequence of data values
# If dataframes are tables, a Series is a list.

pd.Series([1, 2, 3, 4, 5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [6]:
# Series are pretty much just a single column of a dataframe. They have an index, and they can be named.

# One difference is that it does not have column names, just one overall name

pd.Series([30, 35, 40], index = ['Bob', 'Sue', 'Joe'], name = 'Age')

Bob    30
Sue    35
Joe    40
Name: Age, dtype: int64

### Reading Data Files
creating data is cool, but most of the time you have to work with data that already exists. Most commonly in the form of CSV files.

CSV stands for Comma Separated Values. It is a common format for data files.

In [7]:
# wine_reviews = pd.read_csv()
wine_reviews = pd.read_csv("../data/winemag-data-130k-v2.csv")

wine_reviews.shape

(129971, 14)

By using shape, we know the size of data is about 130k with 14 columns

We can also take a look at the first 5 entries by using the .head() command

In [8]:
wine_reviews.head()

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


Theres also a bunch you can do with the read_csv command, one of them being an option to use the data's built-in index.

In [9]:
wine_reviews = pd.read_csv("../data/winemag-data-130k-v2.csv", index_col = 0)
wine_reviews.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


# Indexing, Selecting & Assigning

Pandas takes a bunch of features python has by default and lets us use them to do a lot of cool things.

For one, we can take attributes of an object. For example, we can take the column names of a dataframe.

In [None]:
wine_reviews.country

0            Italy
1         Portugal
2               US
3               US
4               US
            ...   
129966     Germany
129967          US
129968      France
129969      France
129970      France
Name: country, Length: 129971, dtype: object

In [15]:
# Similar to python dictionaries, we can get the values by indexing with ( [] ). same thing can be done with columns in DataFrames.
wine_reviews['country']
wine_reviews['country'][0]

'Italy'

### Indexing in Pandas

works like the rest of python. Pandas does have it's own operators 'loc' and 'iloc'. These ones are for more advanced indexing.

#### Index-based Selection