In [None]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

**Introduction to pandas Data Structures**

* To get started with pandas, you will need to get comfortable with its two workhorse
data structures: Series and DataFrame. While they are not a universal solution for
every problem, they provide a solid foundation for a wide variety of data tasks.

**Series**

* A Series is a one-dimensional array-like object containing a sequence of values (of
similar types to NumPy types) of the same type and an associated array of data labels,
called its index.

In [None]:
obj = pd.Series([-11,32,3,54])

In [None]:
obj

Unnamed: 0,0
0,-11
1,32
2,3
3,54


**The string representation of a Series displayed interactively shows the index on the
left and the values on the right. Since we did not specify an index for the data, a
default one consisting of the integers 0 through N - 1 (where N is the length of the
data) is created. You can get the array representation and index object of the Series via
its array and index attributes, respectively.**

In [None]:
obj.array

<NumpyExtensionArray>
[np.int64(-11), np.int64(32), np.int64(3), np.int64(54)]
Length: 4, dtype: int64

In [None]:
obj.index

RangeIndex(start=0, stop=4, step=1)

**Often, you’ll want to create a Series with an index identifying each data point with a
label:**

In [None]:
obj2 = pd.Series([9,5,-3,4,89],index  = ["a",'b','c','d','e'])

In [None]:
obj2

Unnamed: 0,0
a,9
b,5
c,-3
d,4
e,89


In [None]:
obj2.to_dict()

{'a': 9, 'b': 5, 'c': -3, 'd': 4, 'e': 89}

In [None]:
obj2.array

<NumpyExtensionArray>
[np.int64(9), np.int64(5), np.int64(-3), np.int64(4), np.int64(89)]
Length: 5, dtype: int64

In [None]:
obj2.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

**Compared with NumPy arrays, you can use labels in the index when selecting single
values or a set of values:**

In [None]:
obj2['a']

np.int64(9)

In [None]:
obj2[['a','c','d']]

Unnamed: 0,0
a,9
c,-3
d,4


**Using NumPy functions or NumPy-like operations, such as filtering with a Boolean
array, scalar multiplication, or applying math functions, will preserve the index-value
link:**

In [None]:
obj2[obj2>0]

Unnamed: 0,0
a,9
b,5
d,4
e,89


In [None]:
obj2 * 4

Unnamed: 0,0
a,36
b,20
c,-12
d,16
e,356


In [None]:
np.exp(obj2)

Unnamed: 0,0
a,8103.084
b,148.4132
c,0.04978707
d,54.59815
e,4.489613e+38


**Should you have data contained in a Python dictionary, you can create a Series from
it by passing the dictionary:**

In [None]:
sdata = {"Ohio": 35000, "Texas": 71000, "Oregon": 16000, "Utah": 5000}

In [None]:
obj3 = pd.Series(sdata)

In [None]:
obj3

Unnamed: 0,0
Ohio,35000
Texas,71000
Oregon,16000
Utah,5000


**A Series can be converted back to a dictionary with its to_dict method:**

In [None]:
obj3.to_dict()

{'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}

**When you are only passing a dictionary, the index in the resulting Series will respect
the order of the keys according to the dictionary’s keys method, which depends on
the key insertion order. You can override this by passing an index with the dictionary
keys in the order you want them to appear in the resulting Series:**

In [None]:
states = ["California", "Ohio", "Oregon", "Texas"]

In [None]:
obj4 = pd.Series(sdata, index = states)

**Here, three values found in sdata were placed in the appropriate locations, but since
no value for "California" was found, it appears as NaN (Not a Number), which is
considered in pandas to mark missing or NA values. Since "Utah" was not included
in states, it is excluded from the resulting object.**

In [None]:
obj4

Unnamed: 0,0
California,
Ohio,35000.0
Oregon,16000.0
Texas,71000.0


**The isna and notna functions in pandas should be used to detect missing data:**

* Series also has these as instance methods:

In [None]:
obj4.isna()

Unnamed: 0,0
California,True
Ohio,False
Oregon,False
Texas,False


In [None]:
pd.isna(obj4)

Unnamed: 0,0
California,True
Ohio,False
Oregon,False
Texas,False


In [None]:
pd.notna(obj4)

Unnamed: 0,0
California,False
Ohio,True
Oregon,True
Texas,True


In [None]:
obj4.notna()

Unnamed: 0,0
California,False
Ohio,True
Oregon,True
Texas,True


**A useful Series feature for many applications is that it automatically aligns by index
label in arithmetic operations:**

In [None]:
obj3

Unnamed: 0,0
Ohio,35000
Texas,71000
Oregon,16000
Utah,5000


In [None]:
obj4

Unnamed: 0,0
California,
Ohio,35000.0
Oregon,16000.0
Texas,71000.0


In [None]:
obj3 + obj4

Unnamed: 0,0
California,
Ohio,70000.0
Oregon,32000.0
Texas,142000.0
Utah,


In [None]:
obj3 * obj4

Unnamed: 0,0
California,
Ohio,1225000000.0
Oregon,256000000.0
Texas,5041000000.0
Utah,


**Both the Series object itself and its index have a name attribute, which integrates with
other areas of pandas functionality**

In [None]:
obj4.name = "Population"

In [None]:
obj4.index.name = "States"

In [None]:
obj4

Unnamed: 0_level_0,Population
States,Unnamed: 1_level_1
California,
Ohio,35000.0
Oregon,16000.0
Texas,71000.0


**A Series’s index can be altered in place by assignment:**

In [None]:
obj

Unnamed: 0,0
0,-11
1,32
2,3
3,54


In [None]:
obj.index = ["Bob","steve","arya","jon"]

In [None]:
obj

Unnamed: 0,0
Bob,-11
steve,32
arya,3
jon,54
