# Rounding Decimals

There are primarily five ways of rounding off decimals in Numpy:

truncation

fix

rounding

floor 

ceil

# Truncation

Remove the decimals and return the float number closest to Zero. Use the trunc() and fix() functions.


In [1]:
#Truncate elements of following array:

import numpy as np

arr = np.trunc([-3.1666, 3.6667])

print(arr)

[-3.  3.]


In [2]:
# same example using fix():

import numpy as np

arr = np.fix([-6.7879,6.9999 ])

print(arr)

[-6.  6.]


# Rounding 

The around() function increments preceding digit or decimal by 1 if >= 5 else do nothing.

E.g. round off to 1 decimal point, 3.166666 is 3.2

In [6]:
# Round off 3.16666 to 2 decimal places:

import numpy as np

arr = np.around([3.56666, 2.16666])

print(arr)

[4. 2.]


In [8]:
import numpy as np

arr = np.around([3.16666, 2.56666])

print(arr)

[3. 3.]


# floor

The floor() function rounds off decimal to nearest lower number

E,g, floor 3.166 is 3.

In [9]:
# floor element of following array:

import numpy as np

arr = np.floor([-3.1666, 36667])

print(arr)

[-4.0000e+00  3.6667e+04]


# celi 

The celi() function rounds of nearest upper integers.

E.g. ceil of 3.166 is 4.

In [10]:
# ceil the elements of following array:

import numpy as np

arr = np.ceil([-3.12345 , 6.1000])

print(arr)

[-3.  7.]


# Numpy Logs 

# Logs

numpy provides functions to perform log at the base 2, e and 10

we will also explore how we can take log for any base by creating a custom ufunc.

All of the log functions will place -inf or inf in the elements if the log can not be computed.

# Log at Base 2

use the log2() function to perform log at the base 2.

In [15]:
import numpy as np

arr = np.arange(2,10)

print(np.log2(arr))

[1.         1.5849625  2.         2.32192809 2.5849625  2.80735492
 3.         3.169925  ]


# Log at Base 10 

Use the log10() function to perform log at the  base10.

In [16]:
import numpy as np

arr = np.arange(1, 10)

print(np.log10(arr))


[0.         0.30103    0.47712125 0.60205999 0.69897    0.77815125
 0.84509804 0.90308999 0.95424251]


In [17]:
import numpy as np

arr = np.arange(2, 10)

print(np.log10(arr))


[0.30103    0.47712125 0.60205999 0.69897    0.77815125 0.84509804
 0.90308999 0.95424251]


# Natural Log, or Log at Base e

Use the log() function to perform log at the base e.

In [18]:
# Find log at base e of all element of following array:

import numpy as np

arr = np.arange(1,10)

print(np.log(arr))

[0.         0.69314718 1.09861229 1.38629436 1.60943791 1.79175947
 1.94591015 2.07944154 2.19722458]


# Log at any Base

Numpy does not provide any function to take log at any base, so we can use the frompyfunc() function along with inbuilt function math.log() with two input parameters and one output parameter:

In [19]:
from math import log 
import numpy as np

nplog = np.frompyfunc(log,2,1)

print(nplog(100,15))

1.7005483074552052


In [20]:
from math import log 
import numpy as np

nplog = np.frompyfunc(log,2,1)

print(nplog(1000,15))

2.5508224611828076


# Numpy summations 

# summations

What is the difference between summation and addition

Addition is done between two arguments whereas summation happens over n elements.

In [21]:
# Add the values in arr1 to the values in arr2:

import numpy as np

arr1 = np.array([2,3,4])
arr2 = np.array([5,6,7])

newarr = np.add(arr1,arr2)

print(newarr)

[ 7  9 11]


In [22]:
# sum the values in arr1 and the values in arr2:

import numpy as np

arr1 = np.array([4,5,6])
arr2 = np.array([7,8,9])

newarr = np.sum([arr1,arr2])

print(newarr)


39


# summation over an Axis 

If you specify axis = 1, Numpy will sum the  number in each array.

In [23]:
#perform summation in the following array over 1st axis:

import numpy as np

arr1 = np.array([2,3,4])
arr2 = np.array([5,6,7])

newarr = np.sum([arr1,arr2], axis=1)

print(newarr)

[ 9 18]


In [24]:
import numpy as np

arr1 = np.array([2,3,10])
arr2 = np.array([5,6,4])

newarr = np.sum([arr1,arr2], axis=1)

print(newarr)

[15 15]


# Cummulative Sum

Cummulative sum means partially adding the elements in array.

E.g The partial sum of [1,2,3,4] would be [1,1+2,1+2+3] = [1,3,6].

perfor partial sum with the cumsum() function.

In [25]:
# perform cummulative summation in the following array:

import numpy as np

arr = np.array([5,7,8])

new = np.cumsum(arr)

print(new)


[ 5 12 20]


# Numpy Products

# Products

To find the product of the elements in an array, use the prod() function.

In [26]:
# find the product of the element of this array:

import numpy as np

arr = np.array([1,2,3,4])

x = np.prod(arr)

print(x)

24


In [27]:
#find the product of the elements of two arrays:

import numpy as np

arr1 = np.array([2,5,6,7,9])
arr2 = np.array([1,3,4,8,10])

new = np.prod([arr1,arr2])

print(new)

3628800


# Product Over an Axis
If you specify axis=1, NumPy will return the product of each array.



In [28]:
import numpy as np

arr1 = np.array([2,5,6,7,9])
arr2 = np.array([1,3,4,8,10])

newarr = np.prod([arr1, arr2], axis=1)

print(newarr)


[3780  960]


# Cummulative product 

Cummulative product means taking the product partially.

E.g. The partial product of [1,2,3,4] is [1,1*2,1*2*3,1*2*3*4] = [1,2,6,24]

perform partial sum with the cumprod() function.

In [29]:
import numpy as np

arr = np.array([5, 6, 7, 8])

newarr = np.cumprod(arr)

print(newarr)


[   5   30  210 1680]


# Numpy Differences

# Differences

A discrete difference means subtracting two successive elements.

e.g. for [1,2,3,4] the discrete difference would be [2-1, 3-2, 4-3] = [1, 1, 1]

to find the discrete difference, use the diff() function.

In [30]:
# compute discrete difference of the following array:

import numpy as np

arr = np.array([10,15,25,6])

x = np.diff(arr)

print(x)

[  5  10 -19]


We can perform this operation repeatedly by giving parameter n.

E.g. for [1, 2, 3, 4], the discrete difference with n = 2 would be [2-1, 3-2, 4-3] = [1, 1, 1] , then, since n=2, we will do it once more, with the new result: [1-1, 1-1] = [0, 0]


In [34]:
# compute discrete difference of the following array twice:

import numpy as np

arr = np.array([10,16,30,40])

new = np.diff(arr, n=2)

print(new)

[ 8 -4]


In [35]:
import numpy as np

arr = np.array([10,16,30,40])

new = np.diff(arr, n=3)

print(new)

[-12]


# Numpy Lcm Lowest Common Multiple

# Finding LCM(Lowest Common Multiple)

the Lowest common multiple is the smallest number that is a common multiple of two numbers.

In [36]:
# Find the LCM of the following two numbers:

import numpy as np

num1 = 4
num2 = 6

x = np.lcm(num1 ,num2)

print(x)

12


# Finding LCM in Arrays
To find the Lowest Common Multiple of all values in an array, you can use the reduce() method.

The reduce() method will use the ufunc, in this case the lcm() function, on each element, and reduce the array by one dimension.

In [37]:
# find the LCM of the values of the following array:

import numpy as np

arr = np.array([3,6,9])

x = np.lcm.reduce(arr)

print(x)

18


In [38]:
import numpy as np

arr = np.arange(1, 11)

x = np.lcm.reduce(arr)

print(x)


2520


# NumPy GCD Greatest Common Denominator


# Finding GCD(Greatest Common Denominator)

The GCD (Greatest Common Denominator), also known as HCF (Highest Common factor)  is the biggest number that is a common factor of both of the numbers.

In [40]:
# find the HCF of the following two numbers:

import numpy as np

num1 = 16 
num2 = 24

x = np.gcd(num1,num2)

print(x)# the output will be 8 because 8 is highest divisior

8


# finding GCD in Arrays 

To find the Highest common factor of all values in an array, you can use the reduce() method.

The reduce() method will use the ufunc, in this case the gcd() function, on each element, and reduce the array by one dimension.

In [41]:
#find the Gcd for all of the numbers in the following array:

import numpy as np

arr = np.array([20,8,32,36,16])

x = np.gcd.reduce(arr)

print(x)

4


# Numpy Trigonometric functions

# Trigonometric Functions

Numpy provides the ufuncs sin(), cos() and tan() that take values in radians and produce the corresponding sin, cos and tan values.

In [49]:
# Find the value of pI/2

import numpy as np

x = np.sin(np.pi/2)

print(x)

1.0


In [54]:
# using cos()

import numpy as np

x = np.cos(np.pi)

print(x)

-1.0


In [57]:
# using tan()

import numpy as np

x = np.tan(np.pi/4)

print(x)

0.9999999999999999


In [44]:
# find sin values for all of the values in arr:

import numpy as np

arr = np.array([np.pi/2,np.pi/4,np.pi/8,np.pi/16])

x = np.sin(arr)

print(x)

[1.         0.70710678 0.38268343 0.19509032]


# Convert Degrees into Radians

BY default all of the trigonometric functions take radians as parameters but we can convert radians to degrees and vice versa as well in numpy.

__radians values are pi/180 * degree_values__

In [45]:
# convert all of the values in following array arr to radians:

import numpy as np

arr = np.array([90,180,270,360])

x = np.deg2rad(arr)

print(x)

[1.57079633 3.14159265 4.71238898 6.28318531]


# Radian to Degrees

In [50]:
import numpy as np

arr = np.array([np.pi/2, np.pi, 1.5*np.pi, 2*np.pi])

x = np.rad2deg(arr)

print(x)


[ 90. 180. 270. 360.]


# Finding Angles

Finding angles from values of sine, cos, tan . E.g. sin,cos and tan inverse (arcsin, arccos,arctan). 

Numpy provides ufuncs arcsin(),arccos() and arctan() that produce radian values for corresponding sin, cos and tan values given.

In [51]:
#find the angle of 1.0

import numpy as np

x = np.arcsin(1.0)

print(x)

1.5707963267948966


In [58]:
import numpy as np

x = np.arccos(1.0)

print(x)

0.0


In [59]:
import numpy as np

x = np.arctan(1.0)

print(x)

0.7853981633974483


# Angles of Each Value in Arrays


In [52]:
# find the angle for all of the sine values in the array

import numpy as np

arr = np.array([1,-1,0.1])

x = np.arcsin(arr)

print(x)

[ 1.57079633 -1.57079633  0.10016742]


# Hypotenues

finding hypotenues using pythagoras theorem in Numpy

Numpy provides the hypot() function that takes the base and perpendicular values and produces hypotenues based on pythagoras theorem.



In [61]:
# find the hypotenuses for 4 base and 3 perpendicular

import numpy as np

a = 7
b = 9

x = np.hypot(a,b)

print(x)


6.4031242374328485


# Numpy Hyperbolic Functions

# Hyperbolic functions

Numpy provides the ufuncs sinh(),cosh(), and tanh() that take values in radians and produce the corresponding sinh,cosh, and tanh values..

In [67]:
# find sinh value of PI/2

import numpy as np

x = np.sinh(np.pi/2)

print(x)

2.3012989023072947


In [68]:
# find cosh values for all of the values in arr:

import numpy as np

arr = np.array([np.pi/2,np.pi/3,np.pi/4,np.pi/5])

x = np.cosh(arr)

print(x)

[2.50917848 1.60028686 1.32460909 1.20397209]


# Finding Angles

Finding angles from values of hyperbolic sine,cos,tan. e.g. sinh, cosh and tanh inverse (arcsinh, arccosh, archtanh).

Numpy provides ufuncs arcsinh(), arccosh() and arctanh() that produce radian values for corresponding sinh, cosh and tanh values given.

In [69]:
# find the angle of 2.0

import numpy as np

x = np.arcsinh(1.0)

print(x)

0.881373587019543


# Angles of each value in arrays



In [74]:
# find the angle for all of the tanh values in array:

import numpy as np

arr = np.array([0.1,0.2,0.5])

x = np. arctanh(arr)

print(x)

[0.10033535 0.20273255 0.54930614]


# Numpy set operations 

# What is a set

A set in mathematics is a collection of unique elements.

Sets are used for operations involving frequent intersection, union and difference operations.

# Create sets in numpy

We can use Numpy's unique() method to find unique elements from any array . E.g. create a set array, but remember that the set array should only be 1-D arrays. 

In [75]:
# convert following array with repeated elements to a set:

import numpy as np

arr = np.array([22,22,33,44,55,33,22,11,11,55,])


x = np.unique(arr)

print(x)

[11 22 33 44 55]


# finding Union

To find the unique values of two arrays, use the union1d() method.

In [76]:
# find union of the following two set arrays:

import numpy as np

arr1 = np.array([1,2,3,4])
arr2 = np.array([3,4,5,6])

new = np.union1d(arr1,arr2)

print(new)

[1 2 3 4 5 6]


In [77]:
import numpy as np

arr1 = np.array([11,21,31,14])
arr2 = np.array([13,41,51,16])

new = np.union1d(arr1,arr2)

print(new)

[11 13 14 16 21 31 41 51]


# finding Intersection

To find only the values that are present in both arrays,use the intersect1d() method.

In [78]:
import numpy as np

arr1 = np.array([1,2,3,4])
arr2 = np.array([2,3,1,5])

x = np.intersect1d(arr1,arr2)

print(x)

[1 2 3]


# finding Difference

To find only the values in the first set that is not present in the second set, use the setdiff1d() method. 

In [79]:
import numpy as np

set1 = np.array([56, 22, 3, 4])
set2 = np.array([3, 4, 5, 6])

newarr = np.setdiff1d(set1, set2, assume_unique=True)

print(newarr)

[56 22]


# finding Symmetric Difference

to find only the values that are not present in Both sets, use the setxor1d() method.

In [80]:
# find the symmetric difference of the set1 and set2

import numpy as np

arr1 = np.array([1,2,3,4])
arr2 = np.array([3,4,5,6])

x = np.setxor1d(arr1,arr2)

print(x)

[1 2 5 6]


In [81]:
arr = np.array([1,2,3,4,5,6,7])

print(arr[2:5])

[3 4 5]


# pandas Introduction

# What is Pandas?

pandas is a python library used for working with data sets.

It has functions for analyzing, cleaning, exploring , and manipulating data.

The name "pandas" has a reference to both "panel Data", and "Python Data Analysis" and was created by Wes Mckinney in 2008.


# why use pandas?

Pandas allows us to analyze big data and make conclusions based on statistical theories.

Pandas can clean messy data sets, and make them readable and relevent.

Relevent data is very important in data science.

# What can pandas Do?

pandas gives you answers about the data. Like:

Is there a correlation between two or more columns?

what is average value?

Max value?

min value?

Pandas are also able to delete rows that are not relevent, or contains wrong values, like empty or Null values.

this is called cleaning the data.

In [82]:
pip install pandas

Note: you may need to restart the kernel to use updated packages.


In [84]:
import pandas as pd


mydataset = {
    
    'cars' : ["BMW","Volvo","Ford"],
    'passings' : [3,7,2]
}

my_var = pd.DataFrame(mydataset)

print(my_var)



    cars  passings
0    BMW         3
1  Volvo         7
2   Ford         2


# Pandas Series

# What is a Series?

a pandas series is like a column in a table.

it is a one-dimensional array holding data of my age.

In [86]:
import pandas as pd 

a = [1,2,3,4]

print(pd.Series(a))

0    1
1    2
2    3
3    4
dtype: int64


In [87]:
import pandas as pd

a = ['Elango','Harish','Mani','Sheyam']

x = pd.Series(a)

print(x)

0    Elango
1    Harish
2      Mani
3    Sheyam
dtype: object


# Labels 

If nothing else is specified, the values are labeled with their index number. First value has index 0, second value has  index 1 etc...

this lable can be used to access a specified value.

In [88]:
# return the first value of the series:

print(x[0])

Elango


# Create Labels

with the index argument, you can name your own labels.

In [93]:
# create your own labels:

import pandas as pd

a = [1,7,2]

x = pd.Series(a, index = ['x','y','z'])

print(x)

print(x['z'])

x    1
y    7
z    2
dtype: int64
2


# Key / value objects as series

you can also use a key/value object, like a dictionary, when creating a series.

In [95]:
# Create a simple Pandas Series from a dictionary:

import pandas as pd 

x = {
    "day1": 366,
    "day2": 365,
    "day3": 364,
    "day4": 363
}

a = pd.Series(x)

print(a)

# the keys in dictionary becomes the labels.

day1    366
day2    365
day3    364
day4    363
dtype: int64


to select only some of the items in the dictionary, use the index argument and specify only the items you want to include in the seires.

In [96]:
b = pd.Series(x , index = ["day1","day3"])

print(b)

day1    366
day3    364
dtype: int64


# DataFrames

Data sets in pandas are usually multi-dimensional tables, called Dataframes.

Series is like a column, a DataFrame is the whole table.

In [100]:
import pandas as pd

data = {
    "calories" : [420,380,390],
    "duration" : [50,40,45]
}

a = pd.DataFrame(data , index = ["x","y","z"] )

print(a)

   calories  duration
x       420        50
y       380        40
z       390        45


# Pandas DataFrames

# What is a Dataframe?

A pandas Dataframe is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

In [102]:
import pandas as pd 

data = {
    "calories": [420,380,390],
    "duration": [50,40,45]
}

#load data into a Dataframe object:

df = pd.DataFrame(data)

print(df)

   calories  duration
0       420        50
1       380        40
2       390        45


# Locate Row

As you can see from the result above, the dataframe is like a table with rows and columns.

pandas use the __loc__ attribute to return one or more specified row(s)

In [104]:
# reaturn to the row index

print(df.loc[0])

calories    420
duration     50
Name: 0, dtype: int64


In [105]:
print(df.loc[2])

calories    390
duration     45
Name: 2, dtype: int64


In [107]:
print(df.loc[[1 ,2]])

   calories  duration
1       380        40
2       390        45


# Named Indexes

With the index argument, you can name your own indexes.

In [109]:
# Add a list of names to give each row a name:

import pandas as pd

data = {
    "calories" : [420,380,390],
    "duration" : [50,40,45]
}

df = pd.DataFrame(data, index = ["day1","day2","day3"])

print(df)

      calories  duration
day1       420        50
day2       380        40
day3       390        45


# Located Named Indexes

Use the named index in the loc attribute to return the specified row(s).

In [110]:
#refer to the named index:
print(df.loc["day2"])

calories    380
duration     40
Name: day2, dtype: int64


# Load files into a Dataframe

If your data sets are stored in a file, Pandas can load them into a DataFrame.

In [3]:
import pandas as pd

df = pd.read_csv('age.csv')

df

Unnamed: 0,Age
0,Under 18
1,18-24
2,25-34
3,35-44
4,45-54
5,55-64
6,65 or Above
7,Prefer Not to Answer


In [5]:
rows ,column = df.shape

In [6]:
rows


8

In [7]:
column

1

In [9]:
df.head(3)

Unnamed: 0,Age
0,Under 18
1,18-24
2,25-34


In [11]:
df.tail(2)

Unnamed: 0,Age
6,65 or Above
7,Prefer Not to Answer


In [12]:
df.columns

Index(['Age'], dtype='object')

In [13]:
df.describe()

Unnamed: 0,Age
count,8
unique,8
top,Prefer Not to Answer
freq,1
