## **Reading csv files with python**

In [47]:
import csv

with open('../datasets/mpg.csv') as csvfile:
  mpg = list(csv.DictReader(csvfile))

mpg[:3] # The first three dictionaries in our list.

[{'': '1',
  'manufacturer': 'audi',
  'model': 'a4',
  'displ': '1.8',
  'year': '1999',
  'cyl': '4',
  'trans': 'auto(l5)',
  'drv': 'f',
  'cty': '18',
  'hwy': '29',
  'fl': 'p',
  'class': 'compact'},
 {'': '2',
  'manufacturer': 'audi',
  'model': 'a4',
  'displ': '1.8',
  'year': '1999',
  'cyl': '4',
  'trans': 'manual(m5)',
  'drv': 'f',
  'cty': '21',
  'hwy': '29',
  'fl': 'p',
  'class': 'compact'},
 {'': '3',
  'manufacturer': 'audi',
  'model': 'a4',
  'displ': '2',
  'year': '2008',
  'cyl': '4',
  'trans': 'manual(m6)',
  'drv': 'f',
  'cty': '20',
  'hwy': '31',
  'fl': 'p',
  'class': 'compact'}]

**Summing data in a csv file**

In [48]:
sum(float(d['cty']) for d in mpg)

3945.0

## **Numpy**

The Numpy module is mainly used for working with numerical 
data. It provides us with a powerful object known as an Array.
With Arrays, we can perform mathematical operations on 
multiple values in the Arrays at the same time, and also
perform operations between different Arrays, similar to 
matrix operations. 

In [49]:
import numpy as np
import math 

**Array creation**

In [50]:
a = np.array([1,2,3,4])
print(a.ndim)

1


Two dimensional array

In [51]:
b = np.array([[1,2,3,4],[5,6,7,8]])
print(b.ndim)

2


In [52]:
print(b.shape)
# 2 by 4

(2, 4)


Adding default values to an array

In [53]:
c = np.zeros((2,4))
print(c)
d = np.ones((3,2))
print(d)

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]]
[[1. 1.]
 [1. 1.]
 [1. 1.]]


**Array operations**

In [54]:
e = np.arange(0, 10, 2)
f = np.linspace(0, 10, 5)

print(e)
print(f)
print(e+f)
print(e*f)
print(e**2)
print(e-f)
print(e/f)


[0 2 4 6 8]
[ 0.   2.5  5.   7.5 10. ]
[ 0.   4.5  9.  13.5 18. ]
[ 0.  5. 20. 45. 80.]
[ 0  4 16 36 64]
[ 0.  -0.5 -1.  -1.5 -2. ]
[nan 0.8 0.8 0.8 0.8]


  print(e/f)


In [55]:
# Use reshape to change the shape of an array

g = np.array([1,2,3,4])
print(g.reshape(2,2))

[[1 2]
 [3 4]]


In [56]:
g=g*2
print(g)

[2 4 6 8]


## **REGEX**

In [74]:
import re

str = "Amy works diligently. Amy gets good grades. Our student Amy is succesful."

print(re.search("Amy", str))

print(re.split("Amy", str))

print(re.findall("Amy", str))


<re.Match object; span=(0, 3), match='Amy'>


error: unterminated character set at position 4

### **Patterns and character classes**

#### **Set Operators**

In [58]:
# Set Operator
grades="ACAAAABCBCBAA"
re.findall("[AB]", grades)

['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'A', 'A']

In [59]:
re.findall("[A][B-C]", grades)

['AC', 'AB']

In [60]:
# You could use the or operator | to get the same result
re.findall("AB|AC", grades)

['AC', 'AB']

**Caret - ^**

In [61]:
# Check if the 1st character is A
re.findall("^A", grades)

['A']

In [62]:
# Check if the first character is B
re.findall("^B", grades)

[]

In [63]:
# Since it is not B, it will return an empty list

**Dollar Sign - $**

In [64]:
# Check if the last character is A
re.findall("A$", grades)

['A']

In [65]:
# Check if the last character is B
re.findall("B$", grades)

[]

In [66]:
# Since the last character is not B, it will return an empty list

In [67]:
# Return a list excluding A
re.findall("[^A]", grades)

['C', 'B', 'C', 'B', 'C', 'B']

In [68]:
# Return an empty list
re.findall("^[^A]", grades)

[]

#### **Quantifiers**


In [69]:
# Quantifiers are the number of times you want a pattern to be matched in order to match. The most basic
# quantifier is expressed as e{m,n}, where e is the expression or character we are matching, m is the minimum
# number of times you want it to matched, and n is the maximum number of times the item could be matched.

# Let's use these grades as an example. How many times has this student been on a back-to-back A's streak?
re.findall("A{2,10}", grades) # 2 is the min and 10 is the max

['AAAA', 'AA']

In [70]:
# So we see that there were two streaks, one where the student had four A's, and one where they had only two
# A's

# We might try and do this using single values and just repeating the pattern
re.findall("A{1,1}A{1,1}", grades)

['AA', 'AA', 'AA']

In [71]:
# As you can see, this is different than the first example. The first pattern is looking for any combination
# of two A's up to ten A's in a row. So it sees four A's as a single streak. The second pattern is looking for
# two A's back to back, so it sees two A's followed immediately by two more A's. We say that the regex
# processor begins at the start of the string and consumes variables which match patterns as it does.

# It's important to note that the regex quantifier syntax does not allow you to deviate from the {m,n}
# pattern. In particular, if you have an extra space in between the braces you'll get an empty result
re.findall("A{2, 2}",grades)

[]

In [72]:
# One number in braces 
re.findall("A{2}",grades)

['AA', 'AA', 'AA']

In [73]:
# Using this, we could find a decreasing trend in a student's grades
re.findall("A{1,10}B{1,10}C{1,10}",grades)

['AAAABC']