<a href="https://colab.research.google.com/github/almasfiza/Data-Science/blob/master/WeekTwoA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**WEEK TWO A-SERIES DATA STRUCTURE IN PANDAS B-DATAFRAME**

A-SERIES DATA STRUCTURE

Series can be considered as a cross between lists and dictionaries. If you have numeric data in your series the dtype it is associated with is int and float. If you have strings in your series the dtype it is associated with is an object.

For null values in the int dtype, we have the keyword NaN which means not a number. If we have a null value the dtype int is changed to float and NaN is added. For null values in the object dtype, we have the keyword Null. There is some ambiguity involved when we are comparing NaN types, example np.nan == np.nan results to False. That is why we should use inbuilt functions to work with NaN values like np.isnan(np.nan) results to True.

Furthermore np.nan == Null results False.

Pandas is derived from the term **panel data**. It is an econometric term for datsets which include observations over multiple time frames for the same individual. The pandas library was used initially for handling financial datasets.

In [None]:
#importing the pandas library
import pandas as pd


In [None]:
#creating a series using a list
country = ["India", "Canada", "USA", "Germany"]
pd.Series(country)


0      India
1     Canada
2        USA
3    Germany
dtype: object

In [None]:
lucky_numbers = [4,7,10]
pd.Series(lucky_numbers)

0     4
1     7
2    10
dtype: int64

In [None]:
#notice the dtype to change from int to float on addition of NaN value
lucky_numbers = [4,3,None]
pd.Series(lucky_numbers)

0    4.0
1    3.0
2    NaN
dtype: float64

In [None]:
country = ["India", "Canada", "USA", None]
pd.Series(country)

0     India
1    Canada
2       USA
3      None
dtype: object

In [None]:
#looking at the ambiguity of operations involving NaN
import numpy as np
print(np.nan == None)
print(np.nan == np.nan)
print(np.isnan(np.nan))

False
False
True


In [None]:
#making a series using dictionary
capitals = {'India' : 'Delhi',
            'Canada' : 'Ottawa',
            'USA' : 'Washington DC',
            'Germany' : 'Berlin',
            'South Korea' : 'Seoul'
            }
cap = pd.Series(capitals)
cap

India                  Delhi
Canada                Ottawa
USA            Washington DC
Germany               Berlin
South Korea            Seoul
dtype: object

**QUERYING**

In [None]:
#finding the indeces
cap.index

Index(['India', 'Canada', 'USA', 'Germany', 'South Korea'], dtype='object')

In [None]:
#Another way of making a series using dictionary 
cap2 = pd.Series(['Baghdad','Ankara'], index = ['Iraq','Turkey'])
cap2

Iraq      Baghdad
Turkey     Ankara
dtype: object

In [None]:
#in case od adding indeces and having no matching value in the series, a Nan value is padded
test_series = pd.Series(capitals, index = ['South Korea','Canada','China'])
test_series

South Korea     Seoul
Canada         Ottawa
China             NaN
dtype: object

In [None]:
cap

India                  Delhi
Canada                Ottawa
USA            Washington DC
Germany               Berlin
South Korea            Seoul
dtype: object

In [None]:
cap2

Iraq      Baghdad
Turkey     Ankara
dtype: object

In [None]:
#iloc is used to query value using the index
# find out the value at the third entry to find the capital of S Korea in the cap series
cap.iloc[4]

'Seoul'

In [None]:
#loc is used to query a value using the index label
cap.loc["South Korea"]

'Seoul'

In [None]:
#sometimes when the key is not numeric in nature, pandas decides if the user is trying
#to use iloc or loc for query simply using the [] notations
cap[4]

'Seoul'

In [None]:
#but in cases where the index label can be a number, this may give rise to ambiguity
example = pd.Series(['First','Second','Third'], index = [1,2,3])
print(example)


1     First
2    Second
3     Third
dtype: object


In [None]:
#this works well, the loc function is used
example[1]

'First'

In [None]:
#but say i have to use the iloc with index 0 to retrieve the first value
example[0]

KeyError: ignored

In [None]:
#therefore it is good to specify the loc or iloc function being used
example.iloc[0]

'First'

In [None]:
example.loc[1]

'First'

**BROADCASTING FUNCTIONS AND USING MAGIC FUNCTION TIMEIT TO PROVE IT'S SIGNIFICANCE.**

Magic function in python begin with a % sign. If it is a cellular magic function- running in a particular cell at once, it is prefixed with %%.
The timeit function runs the particular cell through defined loops and find the average time to solve it.

In cases of heavy computation, for loops can be costlier than broadcasting functions.

Broadcasting functions treat the list or series as a vector and by using the main name of the series, we can effect all the elements inside it.

In [33]:
nums = pd.Series(np.random.randint(0,1000,10000))
nums.head()

0    541
1    964
2    933
3    381
4    288
dtype: int64

In [34]:
len(nums)

10000

In [35]:
#calling the timeit magic function and running the loop through 10 test cases to find out the speed
%%timeit -n 100
#calculating the sum using a for loop
sum = 0
for num in nums:
  sum += num

100 loops, best of 3: 1.52 ms per loop


In [36]:
#calling the timeit magic function and running the loop through 10 test cases to find out the speed
%%timeit -n 100
#calculating the sum using broadcasting
sum = np.sum(nums)

100 loops, best of 3: 142 µs per loop


In [38]:
#increment each number in the list by ten
num = pd.Series([10,20,30,40,50,60])
print(num)
num += 10
print(num)

0    10
1    20
2    30
3    40
4    50
5    60
dtype: int64
0    20
1    30
2    40
3    50
4    60
5    70
dtype: int64


**BONUS**

In [39]:
#the loc position also helps us add values to the series in case the label does not exist
cap

India                  Delhi
Canada                Ottawa
USA            Washington DC
Germany               Berlin
South Korea            Seoul
dtype: object

In [41]:
cap["New Zealand"] = "Wellington"

In [42]:
cap

India                  Delhi
Canada                Ottawa
USA            Washington DC
Germany               Berlin
South Korea            Seoul
New Zealand       Wellington
dtype: object

In [45]:
#appending two series
print(cap)
print(cap2)


India                  Delhi
Canada                Ottawa
USA            Washington DC
Germany               Berlin
South Korea            Seoul
New Zealand       Wellington
dtype: object
Iraq      Baghdad
Turkey     Ankara
dtype: object


In [46]:
print(cap.append(cap2))

India                  Delhi
Canada                Ottawa
USA            Washington DC
Germany               Berlin
South Korea            Seoul
New Zealand       Wellington
Iraq                 Baghdad
Turkey                Ankara
dtype: object


In [47]:
#checking if appending affects the original series (hint: no it does not)
print(cap)
print(cap2)


India                  Delhi
Canada                Ottawa
USA            Washington DC
Germany               Berlin
South Korea            Seoul
New Zealand       Wellington
dtype: object
Iraq      Baghdad
Turkey     Ankara
dtype: object


In [49]:
#therefore make a copy
copy = cap.append(cap2)

In [50]:
print(copy)

India                  Delhi
Canada                Ottawa
USA            Washington DC
Germany               Berlin
South Korea            Seoul
New Zealand       Wellington
Iraq                 Baghdad
Turkey                Ankara
dtype: object
