# Pandas Data Structures
In this guide, we start with the very basics of Pandas and go through the following steps

1. Load Pandas
2. Series
3. Dataframe

# 1. Load Pandas

In [110]:
import pandas as pd
from pandas import Series, DataFrame

<hr style="border:2px solid gray"> </hr>.

# 2. Series
A `Series` is a `one-dimensional array-like object` containing a `sequence of values` (of similar types to NumPy types) and an associated array of data labels, called its `index`.

**2.1 A simple Series**

In [111]:
#Create a simple Series
obj = pd.Series([2,4,6,8])

In [112]:
#A series object contains a sequence of values and an array of data labels, called its index
print(obj.values)
print(obj.index)

[2 4 6 8]
RangeIndex(start=0, stop=4, step=1)


**2.2 Series with custom index**

In [113]:
#Create a series with Index
obj = pd.Series([2,4,6,8], index=['a','b','c','d'])

In [114]:
#Print obj
print(obj)

a    2
b    4
c    6
d    8
dtype: int64


**Check Series Index and Values**

In [115]:
#Check values and index
print(obj.values)
print(obj.index)

[2 4 6 8]
Index(['a', 'b', 'c', 'd'], dtype='object')


**2.3 Accessing Series Elements - Using index values**

In [23]:
#Using index value
obj['a']

2

In [26]:
#Using multiple index values
obj[['a','b','d']]

a    2
b    4
d    8
dtype: int64

**2.4 Accessing Series Elements - Using index position**

In [27]:
#Using index numerical position
obj[0]

2

In [28]:
#Using multiple position values
obj[[0,2,3]]

a    2
c    6
d    8
dtype: int64

In [29]:
#Using slicing operator
obj[1:3]

b    4
c    6
dtype: int64

**2.5 Boolean Operations**

In [34]:
#Check values greater than 2
obj[obj>2]

b    4
c    6
d    8
dtype: int64

In [37]:
#Scalar multiplication
obj * 2

a     4
b     8
c    12
d    16
dtype: int64

In [122]:
#Check if a value is in the series
4 in obj.values

True

**2.6 Create Series from a Dict**

In [123]:
obj = pd.Series({"Delhi": 100, "Mumbai": 200, "Bangalore": 300})

In [124]:
#Check the object
obj

Delhi        100
Mumbai       200
Bangalore    300
dtype: int64

**2.7 Series properties - Name and Index Name**

In [125]:
#Name the series
obj.name = "City"

In [126]:
#Name the index column
obj.index.name = "Population"

In [139]:
obj

Population
Delhi        100
Mumbai       200
Bangalore    300
Name: City, dtype: int64

<hr style="border:2px solid gray"> </hr>.

# 3. Dataframe
A `DataFrame` represents a `rectangular table of data` and contains an `ordered collection of columns`, each of which can be a different value type (numeric, string, boolean, etc.). The DataFrame has both a `row` and `column index`; it can be thought of as a `dict of Series` all sharing the `same index`.

**3.1 Create a basic Dataframe**

In [95]:
#Creating a Dataframe
data = {"state": ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'], 
        "year": [2000, 2001, 2002, 2001, 2002, 2003], 
        "pop": [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}

df = pd.DataFrame(data)

In [96]:
#View the Dataframe
df.head()

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9


**3.2 Add Index to DataFrame**

In [141]:
#Add index to the Dataframe
df = pd.DataFrame(data, index=[1,2,3,4,5,6])

In [98]:
df

Unnamed: 0,state,year,pop
1,Ohio,2000,1.5
2,Ohio,2001,1.7
3,Ohio,2002,3.6
4,Nevada,2001,2.4
5,Nevada,2002,2.9
6,Nevada,2003,3.2


**3.3 Specify a sequence of columns**

In [99]:
#Sequence of columns
df = pd.DataFrame(data, index=[1,2,3,4,5,6], columns=["year","state","pop"])

In [100]:
df

Unnamed: 0,year,state,pop
1,2000,Ohio,1.5
2,2001,Ohio,1.7
3,2002,Ohio,3.6
4,2001,Nevada,2.4
5,2002,Nevada,2.9
6,2003,Nevada,3.2


**3.4 Accessing Dataframe columns as Series**

In [101]:
#Accessing dataframe columns
df["state"]

1      Ohio
2      Ohio
3      Ohio
4    Nevada
5    Nevada
6    Nevada
Name: state, dtype: object

In [102]:
#Accessing single column
df["state"][0:5]

1      Ohio
2      Ohio
3      Ohio
4    Nevada
5    Nevada
Name: state, dtype: object

In [103]:
#Accessing multiple columns
df[["state","year"]][0:5]

Unnamed: 0,state,year
1,Ohio,2000
2,Ohio,2001
3,Ohio,2002
4,Nevada,2001
5,Nevada,2002


**3.5 Updating dataframe column and name**

In [104]:
df.index.name = 'year'
df.columns.name = 'state'

In [105]:
df.index = df["year"]

In [109]:
df

state,year,state
year,Unnamed: 1_level_1,Unnamed: 2_level_1
2000,2000,Ohio
2001,2001,Ohio
2002,2002,Ohio
2001,2001,Nevada
2002,2002,Nevada
2003,2003,Nevada


In [133]:
df["state"].values

array(['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'], dtype=object)

In [134]:
df["state"].index

Int64Index([2000, 2001, 2002, 2001, 2002, 2003], dtype='int64', name='year')

<hr style="border:2px solid gray"> </hr>.

# End of sheet