pandas
---

# start with pandas


In [1]:
import pandas as pd


# data types in pandas
## data frame
A DataFrame is a table. 


In [2]:
pd.DataFrame({'math': [20, 12], 'physics': [15, 17.5]})

Unnamed: 0,math,physics
0,20,15.0
1,12,17.5


sometimes we  want to assign row labels ourselves instead of numbers

In [3]:
pd.DataFrame({'math': [20, 12], 'physics': [15, 17.5]}
            ,index=["ali","mahdi"])

Unnamed: 0,math,physics
ali,20,15.0
mahdi,12,17.5


## series
A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. 

In [4]:
pd.Series([9.75,20, 15])

0     9.75
1    20.00
2    15.00
dtype: float64

we can assign column values to the Series the same way as before, using an index parameter. However, a Series does not have a column name, it only has one overall name

In [5]:
pd.Series([9.75,20, 15],index=["math","physics","chemistry"],name="scores")

math          9.75
physics      20.00
chemistry    15.00
Name: scores, dtype: float64

# read data from csv
most of the time, we won't actually be creating our own data by hand. Instead, we'll be working with data that already exists.

CSV file is a table of values separated by commas. Hence the name: "Comma-Separated Values", or CSV.

We'll use the pd.read_csv() function to read the data into a DataFrame.

In [6]:
wine_reviews = pd.read_csv("pandas/wine-reviews/winemag-data_first150k.csv")

# size of data
We can use the ```shape``` attribute to check how large the resulting DataFrame is:


In [8]:
wine_reviews.shape

(129971, 14)

# pandas.head
We can examine the contents of the resultant DataFrame using the head() command, which grabs the first five rows:

In [9]:
wine_reviews.head()

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


The pd.read_csv() function is well-endowed, with over 30 optional parameters you can specify. For example, you can see in this dataset that the CSV file has a built-in index, which pandas did not pick up on automatically. To make pandas use that column for the index (instead of creating a new one from scratch), we can specify an index_col

In [14]:
wine_reviews = pd.read_csv("pandas/wine-reviews/winemag-data_first150k.csv", index_col=0)
wine_reviews.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


you can save you data frame to csv with this code :

```python
dataframe.to_csv('address')
```