# Examples for Creating, Reading and Writing in the Data Frame and Series

Before having formating the data frame and series, we should import a library called ***pandas*** in python

By typing the code below:

In [2]:
import pandas as pd

This demand python to import the library pandas and we named it as **pd**

## Data Frame

Think Data Frame as a table which contains various ***columns*** (↓) and ***rows*** ( --> )


or so called as **dictionary** in the python >_<

The Data Frame could store different values as examples below included:
 - Numbers
 - Strings

In [3]:
pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})

Unnamed: 0,Yes,No
0,50,131
1,21,2


`pd.DataFrame()` indicates that we called a function named ***DataFrame*** in the pandas library

Inside the bracket(), we create a dictionary with the format:

`{'column_name1': ['content1', 'content2'], 'column_name2': ['content1', 'content2'], ....}`

That's the **reason** for saying it is similar to the python's dictionary. Now let's recap, to create a Data Frame:

`pd.DataFrame({'column_name1': ['content1', 'content2'], 'column_name2': ['content1', 'content2'], ....})`

In [4]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland,']})

Unnamed: 0,Bob,Sue
0,I liked it.,Pretty good.
1,It was awful.,"Bland,"


Did you noticed the **0** and **1** in the examples?

Those are the ***default indexes*** in the Data Frame, which we could edit too.

Just add this:  **`index = ['name_of_index1','name_of_index2',...]`** behind the content of the Data Frame will do

In [5]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue':['Pretty good.', 'Bland.']}, index = ['Product A', 'Product B'])

Unnamed: 0,Bob,Sue
Product A,I liked it.,Pretty good.
Product B,It was awful.,Bland.


That's all for the Data Frame, let's jump into the next section.

## Series

Series is like a list which contains the values and the index as key. 

**Each key corresponds to a value.**

In [6]:
# Series

In [7]:
pd.Series([1, 2, 3, 4, 5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

In this example, the value of the first index (key) is 1 and so forth.

To create a new Series, the code:

- `pd.Series([value1, value2, value3, value4,......])`

It is similar to Data Frame, the code says to the intepreter to *call a function named Series from the library pandas*, and the values inside the column[ ].

In [8]:
pd.Series([30, 35, 40], index = ['2015 Sales', '2016 Sales', '2017 Sales'], name = 'Product A')

2015 Sales    30
2016 Sales    35
2017 Sales    40
Name: Product A, dtype: int64

Closely to Data Frame, the ways to assign the index: 
  - `index = ['name_of_index1','name_of_index2',...]` behind the content of the Series will do

However, the Series doesn't have a column name ( since it is only a single column of a Data Frame). It has a overall name:
 - ` name = 'name_of_Series'`

Finally,  a short description that links all the concepts upon:

In [9]:
# DataFrame is just a bunch of Series glued together

## Reading data files

In [10]:
# Reading data files

Usually, we rarely create data using either a Data Frame or a Series since it comes to handy. Instead, we work with data that already exists.

To read a data file:

In [11]:
wine_reviews = pd.read_csv("winemag-data-130k-v2.csv")

Data could be stored in various formats, the regular used format is the CSV (comma - separated values). Using `pd.read_csv('file_name')` function to read the data into a Data Frame.


To quickly learn how big is the data file read into, use `variable.shape` and the ***variable*** contains the data file you want to read from.

In [12]:
wine_reviews.shape

(129971, 14)

So, the Data Frame contains nearly 130,000 rows of records (-->) and 14 columns.

The function `head()` gives us an instant review of the first five records from the Data Frame. (Starts counting from 0 to 4)

In [13]:
wine_reviews.head()

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


The function `read_csv()` function allows over 30 optional parameters to design. For example, the dataset has a built-in index named    **Unnamed**    . To make pandas use that column for the index, we can specify :

`index_col = 0`

In more wordy explainations, this asks the pandas to make the column 0 for the index.

In [14]:
wine_reviews = pd.read_csv('winemag-data-130k-v2.csv', index_col = 0)
wine_reviews.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


## Writting a data file

After having certain editing to the dataset (Data Frame), we sometimes would like to have an output csv of this progress.

`file_name_variable.to_csv('filename')`

Let's look into an example to have a better view.

In [15]:
wine_reviews.to_csv('edited_reviews.csv')

We create a csv file named *edited_reviews* from the dataset called *wine_reviews* where we read the original data file into it.