<a href="https://colab.research.google.com/github/a-forty-two/DFE6/blob/main/04_Python_Pandas_DataSeries.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np
import pandas as pd
import datetime

## DataSeries

In [51]:
my_readings = [3.12, 4, 3.24, 3.67, 6, 84]

We can create a DataSeries using any list or np.array,

In [52]:
ds_readings = pd.Series(my_readings)
ds_readings
# Series represent a COLUMN in a dataframe!

0     3.12
1     4.00
2     3.24
3     3.67
4     6.00
5    84.00
dtype: float64

In [53]:
xyz_index = ['a','b','c','d','e','f']
ds_readings.index = xyz_index

In [54]:
## Output Index
ds_readings.index

Index(['a', 'b', 'c', 'd', 'e', 'f'], dtype='object')

In [55]:
ds_readings

a     3.12
b     4.00
c     3.24
d     3.67
e     6.00
f    84.00
dtype: float64

In [56]:
# reset it to original so that future ops can run 
ds_readings = pd.Series(my_readings)
ds_readings.index

RangeIndex(start=0, stop=6, step=1)

In [57]:
## Output Value
ds_readings.values

array([ 3.12,  4.  ,  3.24,  3.67,  6.  , 84.  ])

In [58]:
ds_readings[0]

3.12

In [59]:
ds_readings[4]

6.0

In [60]:
ds_readings.mean(), ds_readings.std()
# zscore = (data-mean)/std

(17.338333333333335, 32.67413069488868)

In [61]:
mu, sigma = ds_readings.mean(), ds_readings.std()
ds = (ds_readings - mu)/sigma
ds

0   -0.435156
1   -0.408223
2   -0.431483
3   -0.418323
4   -0.347013
5    2.040197
dtype: float64

In [70]:
(ds<2) 

0     True
1     True
2     True
3     True
4     True
5    False
dtype: bool

In [75]:
outliers = ds[(ds>2) | (ds<-2)] # | means OR, & means AND
outliers

5    2.040197
dtype: float64

**Masking**

In [40]:
## Note list input
# [ [ ]] internal [] can be used as FILTERS-> wherever evaluation is TRUE, we get
# the value, evaluation is false-> the value is skipped!

# only alternate elements 

ds_readings[[True, False, True, False, True, False]]

0    3.12
2    3.24
4    6.00
dtype: float64

In [36]:
x = (ds_readings % 2 == 0)
list(x)

[False, True, False, False, True, False]

In [44]:
ds_readings[ds_readings % 2 == 0]

1    4.0
4    6.0
dtype: float64

In [None]:
ds_readings + 10

In [None]:
ds_readings > 3.5

### Q: Why might the above be useful?

Output would be very useful when used as a mask for another series

In [None]:
ds_readings[ds_readings > 3.5]

We can always combine filtering and operators

In [None]:
ds_readings[ds_readings > 3.5] * 2

And to change the values, either all, or partial is easy,

In [None]:
ds_readings[ds_readings > 3.5] = 0

In [None]:
ds_readings

## Exercise
   * ### Create a new Dataseries on a topic of your choosing(numeric, length = 8)
   * ### Output the 2nd, last, and last two elements
   * ### Subtract a number from all elements
   * ### Generate and apply a mask
   * ### Use the mask to set values to 0

### How do I modify index?

In [None]:
timings = [datetime.time(1, 3, increment) for increment in range(6)]

In [None]:
timings

In [None]:
ds_readings.index = timings

In [None]:
ds_readings

The `.value_counts` function finds all the unique values in the series and gives the number of ocurrences of the same number in the series,

In [None]:
ds_readings.value_counts()

we can also sort, ascending and descending

In [None]:
ds_readings.sort_values()

In [None]:
ds_readings.sort_values(ascending = False)

### np.nan

`np.nan` refers to a value that should be but do not exist. And pandas provides an easiy function to check emptiness

We first add np.nan into the seies

In [None]:
ds_readings[datetime.time(0,3,6)] =  np.nan 

In [None]:
ds_readings

In [None]:
ds_readings[datetime.time(0,3,7)] =  3.48 

In [None]:
ds_readings

And pandas provides method for checking emptiness

In [None]:
ds_readings.isna()

In [None]:
ds_readings[ds_readings.isna()]

In [None]:
result = ds_readings[ds_readings.isna()]

In [None]:
result.index

To remove items, use drop. You need to refer to the index value.

In [None]:
ds_readings.drop(result.index)

In [None]:
ds_readings.isna()

There are reduction methods 

In [None]:
ds_readings.isna().any()

In [None]:
ds_readings.isna().all()

In [None]:
ds_readings.isna().sum()

In [None]:
ds_readings.unique()

### Mappings

In [None]:
mapping = {0: 10.0, np.nan: 0.0}
ds_readings.replace(mapping)

In [None]:
def myround(x):
    return round(x, 1)

In [None]:
ds_readings.map(myround)

In [None]:
ds_readings.map(lambda x: round(x,1))

## Exercise
   * ### Change the index of the list you created in the previous exercise so that it is indexed by time
   * ### Insert several np.nan values
   * ### remove these values using `.isna()` and `.drop()`
   * ### Define a function which squares numbers given as input and apply it accross the list using `.map()`
    

In [76]:
a = 1
b = 2

In [77]:
# result => a=2, b= 1

# WITHOUT using a 3rd variable!

In [78]:
# write a PYTHON function to swap values of A and B WITHOUT using a 3rd variable!

In [None]:
# arithmetic-> +-/*
# conditional-> > < <= >=
# ternary operation-> (condition)?true_value:false_value
# bitwise-> 
# 2-> 0010
# 4-> 0100
# 2 & 4-> 0010 & 0100-> 

#   0010
# & 0100
# ->   

In [79]:
a=1
b= 200
a = a^b
b = a^b
a = a^b 
a,b

(200, 1)

In [80]:
# exclusive OR:
# ONLY 1 of the bits should be TRUE!
# diff from OR-> at least 1 bit should be true!
# OR-> 0 | 1 = 1, 1 | 0 = 1, 1 | 1 = 1, 0 | 0 = 0
# XOR-> 0 ^ 1 = 1, 1 ^ 0 = 1, 1 ^ 1 = 0, 0 | 0 = 0

In [95]:
a = 2
# 0010

b = 3
# 0011

#  0010 
# ^0011
#. 0001-> a; b = 0011

#  0001
# ^0011
#. 0010-> b; a-> 0001

#    0001
#.  ^0010
#->. 0011-> a; b -> 0010

# INPLACE operations-> that DOn't USE EXTRA SPACE or MEMORY!

a= 1
b= 2
a = a^b
b = a^b
a = a^b
a,b

(2, 1)

In [83]:
# any operation-> creates a NEW dataframe!

In [84]:
a = {'num': [1,2,3,4,5],
     'isCop': [True, False, True, False, True]}

In [85]:
import pandas as pd
df = pd.DataFrame(a)

In [86]:
df

Unnamed: 0,num,isCop
0,1,True
1,2,False
2,3,True
3,4,False
4,5,True


In [90]:
df2 = df.set_index('num')
df2

Unnamed: 0_level_0,isCop
num,Unnamed: 1_level_1
1,True
2,False
3,True
4,False
5,True


In [89]:
df.set_index('num', inplace=True)

Unnamed: 0,num,isCop
0,1,True
1,2,False
2,3,True
3,4,False
4,5,True


In [91]:
# In MEMORY-> inplace dataframes, BITWISE Operations

In [92]:
# generate all 4 bit binary numbers
# bit options-> 0 and 1
# generate all possible permutations of 0 and 1
for a in [0,1]:
  for b in [0,1]:
    for c in [0,1]:
      for d in [0,1]:
        print(a ,b,c,d)

# GRID SEARCH 


# 10 planets-> 4 factors-> Oxygen, Nitrogen, Water, U

#       O N H2O U
# P1    
# P2    
# P3....

0 0 0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1


In [94]:
# print all 3 digit hexadecimals
hex = [0,1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F']
for a in hex:
  for b in hex:
    for c in hex:
        print(a ,b,c, end=',')

0 0 0,0 0 1,0 0 2,0 0 3,0 0 4,0 0 5,0 0 6,0 0 7,0 0 8,0 0 9,0 0 A,0 0 B,0 0 C,0 0 D,0 0 E,0 0 F,0 1 0,0 1 1,0 1 2,0 1 3,0 1 4,0 1 5,0 1 6,0 1 7,0 1 8,0 1 9,0 1 A,0 1 B,0 1 C,0 1 D,0 1 E,0 1 F,0 2 0,0 2 1,0 2 2,0 2 3,0 2 4,0 2 5,0 2 6,0 2 7,0 2 8,0 2 9,0 2 A,0 2 B,0 2 C,0 2 D,0 2 E,0 2 F,0 3 0,0 3 1,0 3 2,0 3 3,0 3 4,0 3 5,0 3 6,0 3 7,0 3 8,0 3 9,0 3 A,0 3 B,0 3 C,0 3 D,0 3 E,0 3 F,0 4 0,0 4 1,0 4 2,0 4 3,0 4 4,0 4 5,0 4 6,0 4 7,0 4 8,0 4 9,0 4 A,0 4 B,0 4 C,0 4 D,0 4 E,0 4 F,0 5 0,0 5 1,0 5 2,0 5 3,0 5 4,0 5 5,0 5 6,0 5 7,0 5 8,0 5 9,0 5 A,0 5 B,0 5 C,0 5 D,0 5 E,0 5 F,0 6 0,0 6 1,0 6 2,0 6 3,0 6 4,0 6 5,0 6 6,0 6 7,0 6 8,0 6 9,0 6 A,0 6 B,0 6 C,0 6 D,0 6 E,0 6 F,0 7 0,0 7 1,0 7 2,0 7 3,0 7 4,0 7 5,0 7 6,0 7 7,0 7 8,0 7 9,0 7 A,0 7 B,0 7 C,0 7 D,0 7 E,0 7 F,0 8 0,0 8 1,0 8 2,0 8 3,0 8 4,0 8 5,0 8 6,0 8 7,0 8 8,0 8 9,0 8 A,0 8 B,0 8 C,0 8 D,0 8 E,0 8 F,0 9 0,0 9 1,0 9 2,0 9 3,0 9 4,0 9 5,0 9 6,0 9 7,0 9 8,0 9 9,0 9 A,0 9 B,0 9 C,0 9 D,0 9 E,0 9 F,0 A 0,0 A 1,0 A 2,0 A 3,0 A 4,0 A 5,0 A 