<img src="Robotronix.jpg" width="800" height="400">

# Introduction to Pandas

In this section of the course we will learn how to use pandas for data analysis. You can think of pandas as an extremely powerful version of Excel, with a lot more features. In this section of the course, you should go through the notebooks in this order:

* Introduction to Pandas
* Series
* DataFrames


# Series

The first main data type we will learn about for pandas is the Series data type. Let's import Pandas and explore the Series object.

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

Let's explore this concept through some examples:

In [1]:
import numpy as np

In [2]:
arr = np.array([1,2,'a','b',2,3,4])

In [3]:
arr

array(['1', '2', 'a', 'b', '2', '3', '4'], dtype='<U11')

In [3]:
import numpy as np
import pandas as pd
import array as arr

### Series Example

In [4]:
#of data labels, called its index.
obj = pd.Series([4, 7, -5, 3,'apple',0.5])
obj
#The string representation of a Series displayed interactively shows the index on the
#left and the values on the right. Since we did not specify an index for the data, a
#default one consisting of the integers 0 through N - 1 (where N is the length of the
#data) is created.

0        4
1        7
2       -5
3        3
4    apple
5      0.5
dtype: object

In [5]:
type(obj)

pandas.core.series.Series

In [6]:
#You can get the array representation and index object of the Series via
#its values and index attributes, respectively:
print(obj.values)
print(obj.index)

[4 7 -5 3 'apple' 0.5]
RangeIndex(start=0, stop=6, step=1)


A Series’s index can be altered in-place by assignment:

In [7]:
obj

0        4
1        7
2       -5
3        3
4    apple
5      0.5
dtype: object

In [9]:
obj

0        4
1        7
2       -5
3        3
4    apple
5      0.5
dtype: object

In [11]:
obj2 = pd.Series([4, 7, -5, 3])
print(obj2)

0    4
1    7
2   -5
3    3
dtype: int64


In [10]:
obj2 = pd.Series([4, 7, -5, 3])

print(obj2)

0    4
1    7
2   -5
3    3
dtype: int64


In [13]:
obj2[2]

-5

In [13]:
data = [4, 7, -5, 3]


In [11]:
index=['d', 'b', 'a','e']

In [14]:
obj2 = pd.Series(data, index)# pd.series(data,index)

print(obj2)

d    4
b    7
a   -5
e    3
dtype: int64


In [17]:
obj2.index
obj2

 4    d
 7    b
-5    a
 3    e
dtype: object

In [18]:
print(obj2['b'])

7


In [19]:
obj2[1]

7

## series

you can use labels in the index when selecting single
values or a set of values:

In [20]:
print(obj2['a'])  

-5


In [21]:
obj2

d    4
b    7
a   -5
e    3
dtype: int64

In [22]:
obj2['b']=5
print(obj2)

d    4
b    5
a   -5
e    3
dtype: int64


In [23]:
obj2['d'] = 6

In [24]:
print(obj2)

d    6
b    5
a   -5
e    3
dtype: int64


In [25]:
# ['c','a','d']

In [71]:
obj2[['c','a','d']]

c    3
a   -5
d    4
dtype: int64

In [27]:
type(obj2)

pandas.core.series.Series

In [28]:
obj2['a']=-5

In [29]:
obj2 = pd.Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])

print(obj2)

d    4
b    7
a   -5
c    3
dtype: int64


In [72]:
obj2 > 0

d     True
b     True
a    False
c     True
dtype: bool

Using NumPy functions or NumPy-like operations, such as filtering with a boolean
array, scalar multiplication, or applying math functions, will preserve the index-value
link:

In [73]:
obj2[obj2 < 0]

a   -5
dtype: int64

In [74]:
print(obj2)

d    4
b    7
a   -5
c    3
dtype: int64


In [75]:
obj2 * 3

d    12
b    21
a   -15
c     9
dtype: int64

In [77]:
obj2

d    4
b    7
a   -5
c    3
dtype: int64

In [78]:
np.exp(obj2)

d      54.598150
b    1096.633158
a       0.006738
c      20.085537
dtype: float64

In [79]:
obj2

d    4
b    7
a   -5
c    3
dtype: int64

Another way to think about a Series is as a fixed-length, ordered dict, as it is a mapping
of index values to data values

In [80]:
 's' in obj2

False

you have data contained in a Python dict, you can create a Series from it by
passing the dict:

In [81]:
sdata = pd.Series([35000,71000,16000], index = ['Dl','Mh','Ka'])
print(sdata)

Dl    35000
Mh    71000
Ka    16000
dtype: int64


In [82]:
sdata = {'Dl': 35000, 'Kh': 71000, 'Mp': 16000, 'Ka': 5000}
type(sdata)

dict

In [83]:
obj3 = pd.Series(sdata)
print(obj3)
type(obj3)

Dl    35000
Kh    71000
Mp    16000
Ka     5000
dtype: int64


pandas.core.series.Series

When you are only passing a dict, the index in the resulting Series will have the dict’s
keys in sorted order. You can override this by passing the dict keys in the order you
want them to appear in the resulting Series

In [84]:
## covid -19 
states = ['Ta','Dl', 'Mh', 'Mp','Ka']
type(states)



list

In [85]:
sdata = {'Dl': 35000, 'Kh': 71000, 'Mp': 16000, 'Ka': 5000}
states = ['Ta','Dl', 'Mh', 'Mp','Ka']

obj4 = pd.Series(sdata, index=states)

obj4

Ta        NaN
Dl    35000.0
Mh        NaN
Mp    16000.0
Ka     5000.0
dtype: float64

Here, three values found in sdata were placed in the appropriate locations, but since
no value for 'Ta' was found, it appears as NaN (not a number), which is considered
in pandas to mark missing or NA values. Since 'Ka' was not included in
states, it is excluded from the resulting object.

the terms “missing” or “NA” interchangeably to refer to missing data. The
isnull and notnull functions in pandas should be used to detect missing data:

In [86]:
obj4

Ta        NaN
Dl    35000.0
Mh        NaN
Mp    16000.0
Ka     5000.0
dtype: float64

In [87]:
pd.isnull(obj4)

Ta     True
Dl    False
Mh     True
Mp    False
Ka    False
dtype: bool

In [45]:
pd.notnull(obj4)



Ta    False
Dl     True
Mh    False
Mp     True
Ka     True
dtype: bool

In [89]:
obj4

Ta        NaN
Dl    35000.0
Mh        NaN
Mp    16000.0
Ka     5000.0
dtype: float64

In [90]:
obj4.name='covid'

In [91]:
obj4

Ta        NaN
Dl    35000.0
Mh        NaN
Mp    16000.0
Ka     5000.0
Name: covid, dtype: float64

### Creating a Series

You can convert a list,numpy array, or dictionary to a Series:

In [93]:
labels = ['a','b','c']# index 
my_list = [10,20,30]#values
#arr = np.array([10,20,30])
#d = {'a':10,'b':20,'c':30}

In [95]:
pd.Series(labels,my_list)

10    a
20    b
30    c
dtype: object

** Using Lists**

In [51]:
pd.Series(data=labels,index=my_list)

10    a
20    b
30    c
dtype: object

In [52]:
pd.Series(data=my_list,index=labels)

a    10
b    20
c    30
dtype: int64

In [53]:
pd.Series(my_list,labels)

a    10
b    20
c    30
dtype: int64

In [96]:
obj4

Ta        NaN
Dl    35000.0
Mh        NaN
Mp    16000.0
Ka     5000.0
Name: covid, dtype: float64

Both the Series object itself and its index have a name attribute, which integrates with
other key areas of pandas functionality:

In [97]:
obj4.name = 'population'

obj4.index.name = 's_name'


In [98]:
obj4

s_name
Ta        NaN
Dl    35000.0
Mh        NaN
Mp    16000.0
Ka     5000.0
Name: population, dtype: float64

In [99]:
## lets add the two series and lets see what happens
sdata = {'Dl': 35000, 'Kh': 71000, 'Mp': 16000, 'Ka': 5000, 'Ta':2000}
states = ['Ta','Dl', 'Mh', 'Mp','Ka']

obj4 = pd.Series(sdata, index=states)

obj4

Ta     2000.0
Dl    35000.0
Mh        NaN
Mp    16000.0
Ka     5000.0
dtype: float64

In [100]:
obj3

Dl    35000
Kh    71000
Mp    16000
Ka     5000
dtype: int64

In [59]:
obj4

Ta     2000.0
Dl    35000.0
Mh        NaN
Mp    16000.0
Ka     5000.0
dtype: float64

In [101]:
obj4+obj3

Dl    70000.0
Ka    10000.0
Kh        NaN
Mh        NaN
Mp    32000.0
Ta        NaN
dtype: float64

In [102]:
np.arange(0,4)

array([0, 1, 2, 3])

In [61]:
data=np.arange(0,4)
labels=['a','b','c','d']
series_1=pd.Series(data,index=labels)

In [103]:
print(series_1)

a    0
b    1
c    2
d    3
dtype: int32


In [104]:
data=np.arange(0,4)
data

array([0, 1, 2, 3])

### ! great Work 
will now move to dataframes 
