## Getting Started with Python, Numpy and Pandas

Python is an interpretive language.  
Python is a strongly-typed and dynamically-typed language.  
*Strongly-typed*: Interpreter always “respects” the types of each variable.  
*Dynamically-typed*: A variable is simply a value bound to a name.

### Primitive Types
- integer
- float
- bool
- string

In [1]:
A = 5
B = 5.2
C = "True"
D = "A String"
E = '''
A 
multi-line
String
'''
print (A, B, C, D, E)

5 5.2 True A String 
A 
multi-line
String



### Dynamically Typed

In [2]:
x = 5
print ("type of variable x:",type(x))
print("x = ",x)

x = "BITS Pilani"
print("type of variable x:",type(x))
print("x = ",x)

x = 5.2
print("type of variable x:",type(x))
print("x = ",x)

type of variable x: <class 'int'>
x =  5
type of variable x: <class 'str'>
x =  BITS Pilani
type of variable x: <class 'float'>
x =  5.2


### Strongly Typed

In [3]:
a = 5.2
x = "A String "
a * x

TypeError: can't multiply sequence by non-int of type 'float'

In [None]:
b = 5
b+y

#### String Facts: Addition of strings

In [4]:
"BITS-Pilani "+"Hyderabad Campus" # Conctenation

'BITS-Pilani Hyderabad Campus'

#### String Facts: <u>product of string with integer</u>

In [5]:
b*y

NameError: name 'b' is not defined

#### String Facts: <u>f-string</u> (formatted string). Allows expression evaluation inside the string.

In [6]:
z = f'Value of b*y: {b*y}'
print (z)

NameError: name 'b' is not defined

#### String Facts: <u>r-string</u> (raw string). Allows expression with escape sequences

In [7]:
z = 'The course name:\tCS F441'
print(z)
z = r'The course name:\tCS F441'
print(z)

The course name:	CS F441
The course name:\tCS F441


#### Booleans and Boolean Factos

In [8]:
a = True
print(a)
b = False
print(b)

True
False


#### <u>True is 1</u> and <u>False is 0</u> in float/integer arithmetic and when multiplied with String

In [9]:
True*100

100

In [10]:
False*100.2

0.0

In [11]:
True*"String"

'String'

In [12]:
False*"String"

''

### Comments and Multiline Comments

In [13]:
# A comment
# There is no multiline commentin Python
# But Triple quotes may be used for multi-line comments.

"""
a = True
print(a)
b = False
print(b)
"""

'\na = True\nprint(a)\nb = False\nprint(b)\n'

### Type casting
- int()
- float()
- str()
- bool()

print(int(5.2))
print(int("5"))
print(float(5))
print(float("5.2"))
print(bool(0))
print(bool(5.2))
print(bool("A string"))
print(bool(""))

### Collection Types
- Lists
- Tuple
- Dictionary
- Set

#### Lists are mutable (changeable) arrays
Ex: names = [‘Zach’, ‘Jay’]   # note the square bracket.  
Indexing to access of individual 


In [14]:
names = ['Jack', 'Jill']
print(names[0])

print(print (names))

print(len(names))

names[1] = "John"
print(names)

emptyList = [] #or list()
print(len(emptyList))

Jack
['Jack', 'Jill']
None
2
['Jack', 'John']
0


##### Extend list

In [15]:
names = ['Jack', 'Jill']
names.append('Rick')
len(names)
print (names)

names.extend(['Kevin', 'Adrian'])
print (names)

['Jack', 'Jill', 'Rick']
['Jack', 'Jill', 'Rick', 'Kevin', 'Adrian']


In [16]:
eNames = ['Jack', 'Jill'] + ['Kevin', 'Adrian']
print(eNames)

['Jack', 'Jill', 'Kevin', 'Adrian']


#### Tuples are an ordered collection, immutable (unchangeable) 
Element access is by indexing, like in list.


In [17]:
coordinates = (2., 5., 1.)     # Note the parenthesis or round brackets.

print(coordinates[0], coordinates[1])
print(len(coordinates))

2.0 5.0
3


In [18]:
emptyTuple = () #tuple()
print(emptyTuple)
oneElemTuple = (1.5, )   #Comma matters!
print(oneElemTuple)

()
(1.5,)


In [19]:
coordinates[0] =3

TypeError: 'tuple' object does not support item assignment

#### Dictionary: an unordered list with key-value pairs.
Elements access is using key as the index.

In [20]:
course = { 
    "number" :  "CS F441", 
    "name": "Data Visualization",
    "classSize": 62
}
print (course)
print (course["number"])

{'number': 'CS F441', 'name': 'Data Visualization', 'classSize': 62}
CS F441


##### Dictionary Methods
- keys()
- values()
- items()

In [21]:
print(course.keys())

print(course.values())

print(course.items())

dict_keys(['number', 'name', 'classSize'])
dict_values(['CS F441', 'Data Visualization', 62])
dict_items([('number', 'CS F441'), ('name', 'Data Visualization'), ('classSize', 62)])


#### Sets: An unordered, and unindexed collection.

In [22]:
thisset = {"apple", "banana", "cherry"}
print(thisset)

# No duplicate items: Set items are unique
thisset = {"apple", "banana", "cherry", "apple"}
print(thisset)

{'apple', 'cherry', 'banana'}
{'apple', 'cherry', 'banana'}


##### Set Membership

In [23]:
print("banana" in thisset)

print("pineapple" in thisset)

thisset.add("pineapple")

print("pineapple" in thisset)

#Allows union, intersection,difference, ...


True
False
True


### Additional Collections via External Packages
- numpy
- pandas

Note: the packages are not natively available with python. Must be installed independently prior to importing.
    
pip install numpy  
pip install pandas  

They must be imported into your code
- import numpy
- import pandas

Sometimes the package names may shortened for convenience of typing
- import numpy as np
- import pandas as pd

#### numpy: a Python package used for working with arrays.
Difference between list and numpy.array:
- numpy.array are non-extendable
- elements are of the same type
- Occupies continuous memory
- efficient

In [24]:
import numpy as np
A = [1, 5, 3, 4, 5]
nA = np.array(A)
print(nA)
print(nA[0])

[1 5 3 4 5]
1


In [25]:
A.append(10)
print(A)

[1, 5, 3, 4, 5, 10]


In [26]:
nA = np.append(nA, 10)
nA

array([ 1,  5,  3,  4,  5, 10])

In [27]:
A+5

TypeError: can only concatenate list (not "int") to list

In [28]:
print(nA+5)

[ 6 10  8  9 10 15]


In [29]:
print(nA)

[ 1  5  3  4  5 10]


##### Numpy offers random module to work with numbers

In [30]:
from numpy import random
print(random.randint(100)) # integer random number in the range [0, 100)

print(random.randint(100, size=(5))) # integer random 1D array of size 5

print (random.rand())   # float random number in the range [0, 1)

print(random.rand(5))   # float random array of size 5

74
[82 67 22  3  9]
0.9195169239723716
[0.93150403 0.57639221 0.72905369 0.53030277 0.04595133]


In [31]:
print(random.uniform(low=0,high=2,size=5)) # uniform distribution in [0,2)

print(random.normal(loc=0,scale = 1, size = 5)) # normal distribution  loc is the mean and scale is the standard deviation

print( random.choice([3, 5, 7, 9], p=[0.1, 0.3, 0.6, 0.0], size=5))

print(random.permutation([1,2,3,4,5]))

[0.30226897 0.95290438 1.06575775 1.12433536 1.38516802]
[ 0.15845311  2.29846682 -0.79955452  0.61740633  2.85902235]
[7 5 7 3 5]
[1 3 5 2 4]


In [32]:
random.uniform()

0.837921694831572

### Pandas: a Python library used for working with data sets.
Pandas stands for Panel data (tabular data).  
It has functions for analyzing, cleaning, exploring, and manipulating data.  
It is used for a wide range of data analysis tasks, such as data cleaning, transformation, exploration, and more. 

Pandas provides data structures called DataFrame (a two-dimensional table-like data structure, like 2D array, or a table with rows and columns), and
Series (a one-dimensional labeled array) that make it easier to work with and analyze data in Python.

In [33]:
import pandas as pd

student1 = {
 "courses": ["F441", "F311"],
 "grades": ["A", "B+"]
}

df = pd.DataFrame(student1)

In [34]:
print(student1)

{'courses': ['F441', 'F311'], 'grades': ['A', 'B+']}


In [35]:
print(df)

  courses grades
0    F441      A
1    F311     B+


In [36]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   courses  2 non-null      object
 1   grades   2 non-null      object
dtypes: object(2)
memory usage: 164.0+ bytes


In [37]:
df.columns

Index(['courses', 'grades'], dtype='object')

In [38]:
df.index

RangeIndex(start=0, stop=2, step=1)

In [39]:
df.index = ["I","II"]

In [40]:
print(df)

   courses grades
I     F441      A
II    F311     B+


##### Accessing Columns and Rows of DataFrame.

In [41]:
#column Access is similar to disctional access using key.
print(df["courses"])

I     F441
II    F311
Name: courses, dtype: object


In [42]:
print(df["grades"])

I      A
II    B+
Name: grades, dtype: object


Pandas dataframe has two different function to access rows.
- .loc is typically used for label indexing and can access multiple columns using column name, 
- .iloc is used for integer indexing both rows and columns.

In [43]:
# Row access
print(df.loc["I"])

courses    F441
grades        A
Name: I, dtype: object


In [44]:
print(df.loc["I","courses"])

F441


In [45]:
print(df.iloc[0,1])

A


In [46]:
print(df.iloc[0]["courses"])

F441


##### Reading tabular data
In pandas, you can use the read_* functions to read various types of data formats into a DataFrame. Here are a few commonly used functions for reading data.
- .read_csv()
- .read_excel()
- .read_json()
- ...

In [47]:
df = pd.read_csv("data.csv")
print(df.head(10))

   Duration  Pulse  Maxpulse  Calories
0        60    110       130     409.1
1        60    117       145     479.0
2        60    103       135     340.0
3        45    109       175     282.4
4        45    117       148     406.0
5        60    102       127     300.0
6        60    110       136     374.0
7        45    104       134     253.3
8        30    109       133     195.1
9        60     98       124     269.0


In [48]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 169 entries, 0 to 168
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Duration  169 non-null    int64  
 1   Pulse     169 non-null    int64  
 2   Maxpulse  169 non-null    int64  
 3   Calories  164 non-null    float64
dtypes: float64(1), int64(3)
memory usage: 5.4 KB


In [49]:
df.describe()

Unnamed: 0,Duration,Pulse,Maxpulse,Calories
count,169.0,169.0,169.0,164.0
mean,63.846154,107.461538,134.047337,375.790244
std,42.299949,14.510259,16.450434,266.379919
min,15.0,80.0,100.0,50.3
25%,45.0,100.0,124.0,250.925
50%,60.0,105.0,131.0,318.6
75%,60.0,111.0,141.0,387.6
max,300.0,159.0,184.0,1860.4


### Flow Control
- if-then-else
- for loop
- while loop

#### if-then-else statements: 3 forms
- if statement
- if...else statement
- if...elif...else statement

Statements can be code blocks.  
Code blocks are created using indents.
Indentation is done using spaces: say 2 or 4 spaces. But should be consistent throughout the file.

In [50]:
number = 1

In [51]:
#
# if statement
#
if number > 0:
    print('Number is positive.')

#
# if...else statement
#
if number > 0:
    print('Positive number')
else:
    print('Negative number')

#
# if...elif...else statement
#
if number > 0:
    print("Positive number")
elif number == 0:
    print('Zero')
else:
    print('Negative number')

Number is positive.
Positive number
Positive number


#### Shorthand if statement
result = value_if_true if condition else value_if_false

In [52]:
number if (number % 2 == 0) else number*2

2

### for loop is used to run a block of code for a certain number of times. 
It is used to iterate over any sequences such as range, list, tuple, string, etc.  
Syntax:  
for val in sequence:  
    # statement(s)

In [53]:
for i in range(5):
    print (i*i)
#The range() function returns a sequence of numbers, starting from 0 by default, and increments by 1 (by default), and stops before a specified number.
#syntax: 
#  range(start, stop, step)

0
1
4
9
16


In [54]:
aList = []
for i in range(5):
    aList.append(i*i)
aList.append(10)
print(aList)


[0, 1, 4, 9, 16, 10]


In [55]:
#### Unpacking tuples

In [56]:
aList = [(1,10), (2,20), (3,30)]
for x, y in aList:
    print(x,y)

1 10
2 20
3 30


#### List Comprehension
new_list = [expression for element in iterable]  
new_list = [expression for element in iterable if condition]  


In [57]:
[i*i for i in range(5)]

[0, 1, 4, 9, 16]

In [58]:
[i*i for i in range(5) if i % 2 == 0]

[0, 4, 16]

#### While loop
while loop is used to run a block of code until a certain condition is met.  
The syntax of while loop is  
~~~
while condition:   
   body of while loop
~~~


In [59]:
i = 0
while i < 5:
 print (i)
 i += 1

0
1
2
3
4


### Functions:
A function is a block of code which only runs when it is called.

Function can be called with data, known as parameters (or arguments).
A function can optionally return data as  result.

Syntax:
~~~
def function_name(arguments):
    # function body 
    return
~~~

In [60]:
def square(num):
    return num * num
print(square(5))

25


In [61]:
for i in [1,2,3]:
    print(f'Square of {i} = {square(i)}')

Square of 1 = 1
Square of 2 = 4
Square of 3 = 9


#### Lambda functions
A lambda function is a small anonymous function.  
A lambda function can take any number of arguments, but can only have one expression.  
syntax:
~~~
lambda arguments : expression
~~~

In [62]:
x = lambda a : a + 10
print(x(5))

15


In [63]:
x = lambda a, b, c : a + b + c
print(x(5, 6, 2))

13
