# **Introduction to pandas Data Structures**

In [2]:
import numpy as np
import pandas as pd

### **Series** :

In [4]:
from pandas import Series, DataFrame

In [7]:
obj = pd.Series(['data1','data2','data3'])
obj

0    data1
1    data2
2    data3
dtype: object

In [8]:
obj.array

<NumpyExtensionArray>
['data1', 'data2', 'data3']
Length: 3, dtype: object

In [10]:
obj.index

RangeIndex(start=0, stop=3, step=1)

In [11]:
obj2 = pd.Series(['D1','D2','D3','D4'], index=['a','b','c','d'])
obj2

a    D1
b    D2
c    D3
d    D4
dtype: object

Using NumPy functions or NumPy-like operations, such as filtering with a Boolean array, scalar multiplication, or applying math functions, will preserve the index-value link:

In [14]:
obj3 = pd.Series([-2,-1,0,1,2],index=['a','b','c','d','e'])
obj3[obj3>0]

d    1
e    2
dtype: int64

In [25]:
'e' in obj3

True

- you can create a Series by passing a dictionary to it.

In [28]:
dct = {
    'name':'sepehr',
    'age' : 22,
    'birthdate' : '2001/01/12',
    'city' : 'Shiraz',
    'major' : 'Computer Engineering'
    }
data = pd.Series(dct)
data

name                       sepehr
age                            22
birthdate              2001/01/12
city                       Shiraz
major        Computer Engineering
dtype: object

In [45]:
indx = {'name','age','major','term'}
data = pd.Series(dct , index=indx)
data

name                   sepehr
term                      NaN
age                        22
major    Computer Engineering
dtype: object

In [53]:
data.isna()

name     False
term      True
age      False
major    False
dtype: bool

In [54]:
data.name = "Students"
data.index.name = "Student Info"
data

Student Info
name                   sepehr
term                      NaN
age                        22
major    Computer Engineering
Name: Students, dtype: object

### **Dataframes** :

In [55]:
data = {
    "state": ["Ohio", "Ohio", "Ohio", "Nevada", "Nevada", "Nevada"],
    "year": [2000, 2001, 2002, 2001, 2002, 2003],
    "pop": [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]
    }
df = pd.DataFrame(data)
df


Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9
5,Nevada,2003,3.2


In [61]:
pd.DataFrame(data, columns=["year", "state", "pop","something"])

Unnamed: 0,year,state,pop,something
0,2000,Ohio,1.5,
1,2001,Ohio,1.7,
2,2002,Ohio,3.6,
3,2001,Nevada,2.4,
4,2002,Nevada,2.9,
5,2003,Nevada,3.2,


- A column in a DataFrame can be retrieved as a Series 

In [63]:
df.year # df["year"]

0    2000
1    2001
2    2002
3    2001
4    2002
5    2003
Name: year, dtype: int64

In [64]:
df.loc[1]

state    Ohio
year     2001
pop       1.7
Name: 1, dtype: object