## Introduction to pandas data structures (series and DataFrames)

1. Series-a series is one-dimensional array-like object containint a sequence of values (of similar types to NumPy types) and an associated array of data labels, *index*. 


In [2]:
import numpy as np
import pandas as pd

In [3]:
# example 
obj=pd.Series([4,7,-5,3])
obj #Since we did not specify an index for the data, adefault one consisting of the integers 0 through N - 1 (where N is the length of the
#data) is created.

0    4
1    7
2   -5
3    3
dtype: int64

In [4]:
#You can get the array representation and index object of the Series via its values and index attributes, respectively:

print(obj.values)
print(obj.index)

[ 4  7 -5  3]
RangeIndex(start=0, stop=4, step=1)


In [5]:
#Often it will be desirable to create a Series with an index identifying each data point with a label:
obj2=pd.Series([4,7,-5,3],index=["a","b","c","d"])
obj2

a    4
b    7
c   -5
d    3
dtype: int64

In [6]:
obj2.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [7]:
#Compared with NumPy arrays, you can use labels in the index when selecting single values or a set of values:
#example
print (obj2['b'])
print (obj2[['a','b']])#list of indices

7
a    4
b    7
dtype: int64


In [8]:
#Using NumPy functions or NumPy-like operations, such as filtering with a boolean
#array, scalar multiplication, or applying math functions, will preserve the index-value link
obj2[obj2>0]

a    4
b    7
d    3
dtype: int64

In [9]:
np.exp(obj2)

a      54.598150
b    1096.633158
c       0.006738
d      20.085537
dtype: float64

In [10]:
'e' in obj2

False

In [11]:
#Should you have data contained in a Python dict, you can create a Series from it by passing the dict:
sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
obj3=pd.Series(sdata)
obj3

Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

In [12]:
# When you are only passing a dict, the index in the resulting Series will have the dict’s
# keys in sorted order. You can override this by passing the dict keys in the order you
# want them to appear in the resulting Series:
states = ['Carlifonia', 'Ohio', 'Oregon', 'Texas']
obj4=pd.Series(sdata,index=states)
obj4#no value for 'California' was found, it appears as NaN (not a number), which is considered in pandas to mark missing or NA values.


Carlifonia        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
dtype: float64

## Data Frame
A DataFrame is a represents a rectangular table of data and contains an ordered collection of columns, each of which can be a different value type(Numeric, string, boolean, etc)-sort of a dic of series all sharing the same index. 

In [15]:
#example 
data ={'state':['Ohio','Luisianna','New York','New Jersey','Nevada','Pennyslvania'],
       'year':[2000,2001,2002,2003,2004,2005],
       'pop':[1.5,1.7,3.6,2.4,2.9,2.9]}
frame=pd.DataFrame(data)
frame

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Luisianna,2001,1.7
2,New York,2002,3.6
3,New Jersey,2003,2.4
4,Nevada,2004,2.9
5,Pennyslvania,2005,2.9


In [17]:
#Columns can be modified by assignment. For example, the empty 'debt' column
#could be assigned a scalar value or an array of values:
frame['debt']=np.arange(6.)
frame

Unnamed: 0,state,year,pop,debt
0,Ohio,2000,1.5,0.0
1,Luisianna,2001,1.7,1.0
2,New York,2002,3.6,2.0
3,New Jersey,2003,2.4,3.0
4,Nevada,2004,2.9,4.0
5,Pennyslvania,2005,2.9,5.0


In [None]:
#When you are assigning lists or arrays to a column, the value’s length must match the
#length of the DataFrame. If you assign a Series, its labels will be realigned exactly to
#the DataFrame’s index, inserting missing values in any holes:
