# Creating data
There are two core objects in pandas: The `DataFrame` and the `Series`

In [1]:
import pandas as pd

<h2>DataFrame</h2>
A DataFrame is a table. It contains an array of individual entries, each of which has a certain <i>value</i>. Each entry corresponds to a row (or record) and a column. 

In [3]:
# Consider this code is simple DataFrame:
pd.DataFrame({'Yes': [50,21], 'No':[131,2]})

#'0, No' entry has the value of 131. The "O, Yes" entry has a value of 50 and so on 

Unnamed: 0,Yes,No
0,50,131
1,21,2


DataFrame entires are not limited to integers. For instance, here's a DataFrame whose values are strings:

In [4]:
pd.DataFrame({'Bob':['I liked it.', 'It was awful.'], 'Sue':['Pretty good.', 'Bland']})

Unnamed: 0,Bob,Sue
0,I liked it.,Pretty good.
1,It was awful.,Bland


We are using the pd.DataFrame() constructor to generate these DataFrame objects. The syntax for declaring a newe one is a dictionary whose keys are the column names.

The dictionary-list constructor assigns the values to the column labels, but just uses an ascending count from 0(0,1,2,3,4...) for the row labels. 

The list of row labels used in a DataFrame is known as an Index. We can assign values to it by using an `index` parameter in our constructor

In [5]:
pd.DataFrame({'Bob':['I liked it.', 'It was awful.'], 'Sue':['Pretty good.', 'Bland']}, index=['Product A', 'Product B'])

Unnamed: 0,Bob,Sue
Product A,I liked it.,Pretty good.
Product B,It was awful.,Bland


<h2>Series</h2>
A series, by contrast is a sequence of data values. If a DataFrame is a table, a series is a list. And in fact you can create one with nothing more than a list:

In [6]:
pd.Series([1,2,3,4,5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

A series is, in essence, a single column of DataFrame. So you can assign row labels to the series the same way as before, using an index parameter. 
However, a series doest not have a column name, it only has one overall name

In [8]:
pd.Series([30,35,40], index=['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')

2015 Sales    30
2016 Sales    35
2017 Sales    40
Name: Product A, dtype: int64

The Series and the DataFrame are intimately related. It's helpful to think of a DataFrame as actually being just a bunch of Series "glued together". 

<h2>Reading data files</h2>
Data can be stored in any of a number of different forms and formats. By far the most basic of these is the humble CSV file.

In [24]:
wine_review = pd.read_csv("/home/rachit/Desktop/Machine learning/12-Pandas/Data/wine-review/winemag-data-130k-v2.csv", index_col=0)

<Note:> the csv file begins with an unnamed column of increasing integers. We want this to be used as the index. 

In [26]:
# we can use the `shape` attribute to check how large the resulting DataFrame is:

wine_review.shape

(129971, 13)

In [25]:
# We can examine the contents of the resultant DataFrame using the head() command, which grabs the first five rows:

wine_review.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


The `pd.read_csv()` function is well-endowed, with over 30 optinal parameters you can specify. 

In [27]:
# To csv
animals = pd.DataFrame({'Cows':[12,20], 'Goats':[22,19]}, index=['Year 1', 'Year 2'])
animals

Unnamed: 0,Cows,Goats
Year 1,12,22
Year 2,20,19


In [28]:
animals.to_csv("Cows_and_goats.csv")