# <a href='https://www.kaggle.com/code/residentmario/creating-reading-and-writing'>Creating, Reading and Writing</a>

In [1]:
import pandas as pd

There are 2 main data structures in pandas: `DataFrame` and `Series`.

## Creating data

### DataFrame

The standard way of declaring a new DataFrame is a dictionary whose keys are the column names, and whose values are a list of entries.

In [2]:
pd.DataFrame(
    { 'Even Series': [0, 2, 4, 'six', 8], 'Odd Series': [1, 3, 'five', 7, 9] }
)

Unnamed: 0,Even Series,Odd Series
0,0,1
1,2,3
2,4,five
3,six,7
4,8,9


While column labels are assigned at creation, the default row label is just the acsending count starting at 0. <br>
Custom row labels can be assigned by using the `index` parameter.

In [3]:
evens = [0, 2, 4, 6, 8]
odds = [1, 3, 5, 7, 9]
row_labels = ['zero, one', 'two, three', 'four, five', 'six, seven', 'eight, nine']

pd.DataFrame(
    { 'Even Series': evens, 'Odd Series': odds }, 
    index = row_labels
)

Unnamed: 0,Even Series,Odd Series
"zero, one",0,1
"two, three",2,3
"four, five",4,5
"six, seven",6,7
"eight, nine",8,9


DataFrame can also be created this way:

In [4]:
fruit_sales = pd.DataFrame(
        [[35, 21], [41, 34]], columns=['Apples', 'Bananas'], index=['2017 Sales', '2018 Sales']
)

fruit_sales

Unnamed: 0,Apples,Bananas
2017 Sales,35,21
2018 Sales,41,34


### Series

Series is a list. Essentially, Series is a single column of a DataFrame. A DataFrame can even be thought of as a bunch of Series glued together.

In [5]:
pd.Series([5, True, 15, 20, 25])

0       5
1    True
2      15
3      20
4      25
dtype: object

Series does not have a column name, it only has one overall name, which can be assigned by using the `name` parameter. <br>
Custom row labels can be assigned to Series by using the `index` parameter, just like how it is done for DataFrame.

In [6]:
pd.Series(
    [1, 2, 3, 4, 5],
    index = ['one', 'two', 'three', 'four', 'five'],
    name = 'onefive'
)

one      1
two      2
three    3
four     4
five     5
Name: onefive, dtype: int64

## Reading data

In [7]:
wine_reviews = pd.read_csv('winemag-data-130k-v2.csv')
wine_reviews.shape

(129971, 14)

In [8]:
wine_reviews.head()

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


You can see in this dataset that the CSV file has a built-in index, which pandas did not pick up on automatically. To make pandas use that column for the index (instead of creating a new one from scratch), we can specify an `index_col`.

In [9]:
wine_reviews = pd.read_csv('winemag-data-130k-v2.csv', index_col=0)
wine_reviews.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


## Writing data

To save DataFrame to a CSV file, use `DataFrame.to_csv('filename.csv')`.

In [10]:
fruit_sales = pd.DataFrame(
        [[35, 21], [41, 34]], columns=['Apples', 'Bananas'], index=['2017 Sales', '2018 Sales']
)

fruit_sales

Unnamed: 0,Apples,Bananas
2017 Sales,35,21
2018 Sales,41,34


In [11]:
fruit_sales.to_csv('fruit_sales.csv')

In [12]:
fruits = pd.read_csv('fruit_sales.csv')
fruits

Unnamed: 0.1,Unnamed: 0,Apples,Bananas
0,2017 Sales,35,21
1,2018 Sales,41,34


In [13]:
fruits = pd.read_csv('fruit_sales.csv', index_col=0)
fruits

Unnamed: 0,Apples,Bananas
2017 Sales,35,21
2018 Sales,41,34
