In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

In [2]:
pd.set_option("display.max_rows",5)

The two core objects in Pandas are : DataFrame and Series

### DataFrame

In [3]:
pd.DataFrame({'Yes': [50, 21], "No": [90,78]})

Unnamed: 0,Yes,No
0,50,90
1,21,78


In [5]:
pd.DataFrame({'Bob':['I LIKED IT HERE','Nice location','NDVI'],
              'Sue': ['Home','Homer','MSAVI']})

Unnamed: 0,Bob,Sue
0,I LIKED IT HERE,Home
1,Nice location,Homer
2,NDVI,MSAVI


- We are useing the pd.DataFrame() constructor to create DataFrame objects

- The syntax for declearing one is creating a dictionary whose keys are the column names (Bob, Sue) and the values are
in the list.

- The DataFrame just uses an ascending count from 0 for the row labels. Sometimes we want to assign these labels ourselves.

- The list of row labels used in a DataFrame is known as an index. We can assign values to it using an **index** parameter in the constructor.


In [7]:
pd.DataFrame({'Bob': [10,90,989], 
             'Sue': [99,88,77]}, index = ['Product A', 'Product B','Product C'])

Unnamed: 0,Bob,Sue
Product A,10,99
Product B,90,88
Product C,989,77


### Series

- Series is a sequence of data values. If a DataFrame is a table, Series is a list.

In [8]:
pd.Series([9,90,7.9,0.987])

0     9.000
1    90.000
2     7.900
3     0.987
dtype: float64

- A series is, in essence a single column of a DataFrame.

- We can assign row labels to the series the same way as before, using an **index** parameter. However, a Series object 
does not have a column name, it only has one overall **name**.

In [9]:
pd.Series([0.45, 0.55, 0.68, 0.76], index = ['NDVI_1','NDVI_2','NDVI_3','NDVI_4'], name = 'NDVI')

NDVI_1    0.45
NDVI_2    0.55
NDVI_3    0.68
NDVI_4    0.76
Name: NDVI, dtype: float64

- The Series object and DataFrame object are **glued** together. It is helpful to think of a DataFrame being a bunch of Series
objects "Glued" together.

### Reading Data Files

In [11]:
wine_reviews = pd.read_csv("Data/winemag-data-130k-v2.csv",index_col = 0)

In [12]:
wine_reviews.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


In [13]:
wine_reviews.shape

(129971, 13)

### Indexing, Selecting And Assigning

#### --- Native Accessors

Native Python objects provide a good way of indexing data, which Pandas carries over to start with.

In [14]:
reviews = wine_reviews

In [15]:
reviews.country

0            Italy
1         Portugal
            ...   
129969      France
129970      France
Name: country, Length: 129971, dtype: object

In [16]:
reviews['country']

0            Italy
1         Portugal
            ...   
129969      France
129970      France
Name: country, Length: 129971, dtype: object

In [17]:
reviews['country'][0]

'Italy'

#### --- Indexing in Pandas