# Python Basics Refresher

## Lists

### 1. Slicing

In [1]:
l1 = list(range(1,11))
print("l1 -",l1)
print("l1[2:-3] -",l1[2:-3])
print("l1[-7:-4] -,",l1[-7:-4])
print("l1[-8:-2:2] -",l1[-8:-2:2])
print("l1[-2:-8:-2] -",l1[-2:-8:-2])
print("l1[-5:] -",l1[-5:])
print("l1[-1:-11:-1] -",l1[-1:-11:-1]) # can omit first/second operand here as well
print("l1[::-1] -",l1[::-1])

l1 - [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
l1[2:-3] - [3, 4, 5, 6, 7]
l1[-7:-4] -, [4, 5, 6]
l1[-8:-2:2] - [3, 5, 7]
l1[-2:-8:-2] - [9, 7, 5]
l1[-5:] - [6, 7, 8, 9, 10]
l1[-1:-11:-1] - [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
l1[::-1] - [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]


### 2. enumerate

In [2]:
branches = ['Mechanical', 'IT', 'INST', 'CSE', 'IT', 'EXTC']
for index, branch in enumerate(branches, start=1):
    print(index, branch)
    

1 Mechanical
2 IT
3 INST
4 CSE
5 IT
6 EXTC


### 3. Count occurences in a list

In [4]:
ages = [12,18, 14, 22, 25, 30, 12, 11, 12, 10, 19, 12, 13, 10]
print(ages.count(12))
print(ages.count(80))

4
0


## Sets

### 1. Set creation

In [5]:
science_branches = {'Mechanical', 'IT', 'Management', 'CS', 'IT', 'EXTC', 'Math', 'IT'}
print(science_branches)
print('CS' in science_branches) #in operation optimized for sets

{'Management', 'IT', 'Math', 'CS', 'EXTC', 'Mechanical'}
True


In [6]:
commerce_branches = {'Finance', 'Management','CS','Math', 'Sales'}

In [7]:
empty = set() #not {} - that's an empty dictionary
print(empty)

set()


### 2. Set Operations

In [7]:
print(science_branches.intersection(commerce_branches))

{'Management', 'CS', 'Math'}


In [8]:
print(science_branches.difference(commerce_branches))

{'IT', 'Mechanical', 'EXTC'}


In [9]:
print(science_branches.union(commerce_branches))

{'EXTC', 'Mechanical', 'Math', 'IT', 'Management', 'CS', 'Sales', 'Finance'}


## Dictionaries

### 1. Creation

In [8]:
student = {'name': 'Ashwin', 'age':25, 'course':['comp sci', 'economics'], 5: (1,2)}
print(student)

{'name': 'Ashwin', 'age': 25, 'course': ['comp sci', 'economics'], 5: (1, 2)}


In [9]:
del student[5]
print(student)

{'name': 'Ashwin', 'age': 25, 'course': ['comp sci', 'economics']}


### 2. Access elements

In [11]:
# intuitive way
print(student['name'])
print(student['phone'])

Ashwin


KeyError: 'phone'

In [3]:
# Better way
print(student.get('phone', 'Not found'))

Not found


### 3. Add,update and delete

In [13]:
student['phone'] = '2222-2222'
print(student)

{'name': 'Ashwin', 'age': 25, 'course': ['comp sci', 'economics'], 'phone': '2222-2222'}


In [14]:
del student['phone']
student

{'name': 'Ashwin', 'age': 25, 'course': ['comp sci', 'economics']}

In [18]:
# update with another dictionary
student_new = {'name': 'Anurag', 'age': 21, 'phone':'5555-5555'}
student.update(student_new)

In [20]:
student

{'name': 'Anurag',
 'age': 21,
 'course': ['comp sci', 'economics'],
 'phone': '5555-5555'}

In [21]:
age = student.pop('age')
print(age)

21


### 4. Unpacking Generalization - merge 2 dictionaries

In [15]:
d1 = {'a':1, 'b':2}
d2 = {'c':3, 'd':4}
z1 = {**d1, **d2} #alternate to update
print(z1)

{'a': 1, 'b': 2, 'c': 3, 'd': 4}


In [16]:
## Overlapping keys - merged left to right and overwrite
d3 = {'a':9, 'b':10}
z2 = {**d1, **d3}
print(z2)

{'a': 9, 'b': 10}


### 5. Sort By Value

In [38]:
d1 = {'a':100, 'b':90, 'c':10, 'd':50}
print(sorted(d1.items(), key= lambda x: x[1]))

[('c', 10), ('d', 50), ('b', 90), ('a', 100)]


## Extra

### 1. is operator

In [28]:
a = 100
b = 100
a is b

True

In [30]:
a=1,2,3
b=1,2,3
print(a is b)

False


In [31]:
print(id(a), id(b))

140104922317232 140104922339656


In [32]:
b = a
print(a is b) ## id(a)==id(b)

True


In [33]:
print(id(a), id(b))

140104922317232 140104922317232


### 2. False Values

The following values evaluate to False in Python
- False
- None
- Zero (any numeric type) - 0, 0j, 0.0
- Empty sequence - (), "", [ ]
- Empty mapping - {}  
- Instances that signal they're empty  
**Everything else evaluates to True**

In [28]:
condition = None
if condition:
    print('Inside True')
else:
    print('Inside False')

Inside False


In [29]:
condition = 0.0
if condition:
    print('Inside True')
else:
    print('Inside False')

Inside False


In [31]:
condition = ''
if condition:
    print('Inside True')
else:
    print('Inside False')
condition = []    
if not condition:
    print('Inside True')
else:
    print('Inside False')

Inside False
Inside True


Everything else evaluates to True

In [36]:
condition = [1,2,3]
if condition:
    print('Inside True')
else:
    print('Inside False')

Inside True


## Files

 - Use context managers to work with files as it automatically closes the file.
 - If we use traditional way (using open method), there could be leakage if we don't close and we might get error thrown at us if the number of file descriptors reaches the max limit.
 - The file handler is still accessible after closing the file but we cannot read the file now.

In [53]:
with open('av.yml') as f:
    print(f.name)
    print(f.mode)
print(f.closed)  

av.yml
r
True


In [55]:
with open('av.yml') as f:
    content = f.read()
    print(content)

name: av
channels:
    - https://conda.anaconda.org/menpo
    - conda-forge
dependencies:
    - python==3.7.0
    - matplotlib==2.2.2
    - numpy==1.15.0
    - pandas==0.23.3
    - jupyter



- The read() method is good if we want to read small files
- But if the file size is too big, we may run out of memory
- we can use methods to read one line at a time:
    - use readline() method
    - use for loop on file descriptor to iterate over the lines - one at a time

In [60]:
with open('av.yml') as f:
    line = f.readline()
    print(line)
    line = f.readline()
    print(line)

name: av

channels:



In [59]:
with open('av.yml') as f:
    for line in f:
        print(line, end='')

name: av
channels:
    - https://conda.anaconda.org/menpo
    - conda-forge
dependencies:
    - python==3.7.0
    - matplotlib==2.2.2
    - numpy==1.15.0
    - pandas==0.23.3
    - jupyter


- Get more control on read by specifying chunksize to read each time:

In [35]:
with open('av.yml') as f:
    chunksize = 30
    block = f.read(chunksize)
    print('Block 1:',block)
    block = f.read(chunksize)
    print('Block 2:',block)
    block = f.read(chunksize)
    print('Block 3:',block)

Block 1: name: av
channels:
    - https
Block 2: ://conda.anaconda.org/menpo
  
Block 3:   - conda-forge
dependencies:



Once it reaches end of file, it will return empty string.
 - We might not know exact file size, here's a more efficient implementations:

In [68]:
with open('av.yml') as f:
    chunksize = 30
    block = f.read(chunksize)
    while(len(block)>0):
        print(block,end='*')
        block = f.read(chunksize)        

name: av
channels:
    - https*://conda.anaconda.org/menpo
  *  - conda-forge
dependencies:
*    - python==3.7.0
    - matp*lotlib==2.2.2
    - numpy==1.1*5.0
    - pandas==0.23.3
    -* jupyter
*

We can also manipulate positions of pointer in the file:

In [70]:
with open('av.yml') as f:
    chunksize = 10
    block = f.read(chunksize)
    print(block)
    print(f.tell())

name: av
c
10


In [78]:
with open('av.yml') as f:
    chunksize = 10
    block = f.read(chunksize)
    print(block,end='*')
    f.seek(0)
    block = f.read(chunksize)
    print(block,end='*')
    f.seek(f.tell()+2)
    block = f.read(chunksize)
    print(block,end='*')

name: av
c*name: av
c*nnels:
   *

- FILE  MODES:
    - r: read
    - w: write
    - r+: read/write
    - a: append
    - rb, wb, ab: read/write/append in binary mode file

In [37]:
with open('av.yml') as rf, open('av_test.yml','w') as wf:
    for line in rf:
        wf.write(line)

- For large files or binary files, it is neither feasible to read one line at a time(IO operations more costly) nor to read all at once.
- So we read in chunks.

In [42]:
with open('python_basic.ipynb') as rf, open('python_basic_test.ipynb','w') as wf:
    chunksize = 2000
    chunk = rf.read(chunksize)
    while len(chunk) > 0:
        wf.write(chunk)
        chunk = rf.read(chunksize)

In [43]:
with open('python_basic.ipynb') as rf, open('python_basic_test.ipynb','w') as wf:
    for l in rf:
        wf.write(l)

**Match the execution time of above 2 approaches**

## OS Module - interacting with the OS

In [45]:
import os

### Working with directory

In [46]:
print(os.getcwd()) #present working directory
os.chdir('/home/ubuntu/Downloads/') #change directory
print(os.getcwd())

/home/ubuntu/Documents/av_2018
/home/ubuntu/Downloads


In [47]:
os.listdir('./scripts/') #contents of working directory

['setup.sh', 'README.md', 'scripts']

In [48]:
os.chdir('/home/ubuntu/Documents/av_2018/')
os.mkdir('temp')

In [49]:
os.listdir()

['av_test.yml',
 'visualization.ipynb',
 '.ipynb_checkpoints',
 'stats.ipynb',
 '__pycache__',
 '.git',
 'python_basic.ipynb',
 'temp',
 '.gitignore',
 'python_basic_test.ipynb',
 'JOC',
 'data.py',
 'av.yml']

Creating directory with sublevels

In [52]:
os.makedirs('temp2/subdir1/subdir3') #More preferable

In [60]:
print(os.listdir())
os.listdir('temp2')

['av_test.yml', 'visualization.ipynb', '.ipynb_checkpoints', 'stats.ipynb', '__pycache__', '.git', 'python_basic.ipynb', 'temp2', '.gitignore', 'python_basic_test.ipynb', 'JOC', 'data.py', 'av.yml']


['subdir2']

In [63]:
#Removing directories
os.rmdir('temp') #doesn't delete non-empty directories - more preferable
os.removedirs('temp2/subdir1/subdir3/') #deletes entire tree along with intermediate directories

In [66]:
os.listdir()

['av_test.yml',
 'visualization.ipynb',
 '.ipynb_checkpoints',
 'stats.ipynb',
 '__pycache__',
 '.git',
 'python_basic.ipynb',
 '.gitignore',
 'python_basic_test.ipynb',
 'JOC',
 'data.py',
 'av.yml',
 'temp10']

In [65]:
os.makedirs('temp')
os.rename('temp', 'temp10')

In [67]:
print(os.listdir())
os.rmdir('temp10')

### File Stats: (helps in webapps to track timestamps of modifed files)

In [68]:
os.stat('./av.yml')

os.stat_result(st_mode=33204, st_ino=31330586, st_dev=2051, st_nlink=1, st_uid=1000, st_gid=1000, st_size=178, st_atime=1537158013, st_mtime=1535236843, st_ctime=1535236843)

In [69]:
from datetime import datetime
modified_time = os.stat('av.yml').st_mtime
print('Last Modified time:',datetime.fromtimestamp(modified_time))

Last Modified time: 2018-08-26 04:10:43.829900


### Traversing Directory tree - os.walk


In [72]:
directory_path = '/home/ubuntu/Documents/'
os.walk(directory_path) # generator object - dir_path, subdirs, files

<generator object walk at 0x7f6cb8215b48>

In [73]:
for dir_path, subdirs, files in os.walk(directory_path):
    print('Current Path:', dir_path)
    print('Directories:', subdirs)
    print('Files:', files)
    print()

Current Path: /home/ubuntu/Documents/
Directories: ['av_2018', '.ipynb_checkpoints', 'python4ds', 'tempo99']
Files: ['import_data.pdf', 'TensorFlow Tutorial For Beginners.ipynb', 'pandas_practice.ipynb', 'sklearn_cs.png', '18990d52-5f15-47cd-b25f-542445901f4e-original.png', '2702d58a-2dca-4135-a10c-2328377d2ff4-original.jpeg', 'f_USTA1032876_20180830_Day4_GE2_6633.jpg', '46c3de34-0c01-41ac-b934-24e0dc012ee2-original.jpeg']

Current Path: /home/ubuntu/Documents/av_2018
Directories: ['.ipynb_checkpoints', '__pycache__', '.git', 'JOC']
Files: ['av_test.yml', 'visualization.ipynb', 'stats.ipynb', 'python_basic.ipynb', '.gitignore', 'python_basic_test.ipynb', 'data.py', 'av.yml']

Current Path: /home/ubuntu/Documents/av_2018/.ipynb_checkpoints
Directories: []
Files: ['PythonForDataScience-checkpoint.ipynb', 'visualization-checkpoint.ipynb', 'stats-checkpoint.ipynb', 'python_basic-checkpoint.ipynb']

Current Path: /home/ubuntu/Documents/av_2018/__pycache__
Directories: []
Files: ['data.cpyth

Current Path: /home/ubuntu/Documents/python4ds/.git/objects/f4
Directories: []
Files: ['d36fe990bfcea7e7d321abe1936727467e172a']

Current Path: /home/ubuntu/Documents/python4ds/.git/objects/41
Directories: []
Files: ['198d9e405e0f53460db78b2897a7be3fc75195']

Current Path: /home/ubuntu/Documents/python4ds/.git/objects/0b
Directories: []
Files: ['d4a569d1d877d360941754ecf6b4347b48ad30', '0b1f26430d17ed8ed3bfbcbf60a7db808f36bb', '7e250d9b1fb7917e08be596bc93bd541edcb59', '6e3e65d40db2dc08aa560fb46bcab9bef4bb0d']

Current Path: /home/ubuntu/Documents/python4ds/.git/objects/49
Directories: []
Files: ['d61ba8b0865bba25da2159d60ee52262cc8488']

Current Path: /home/ubuntu/Documents/python4ds/.git/objects/15
Directories: []
Files: ['2e2f6a86003dd548f19585e91b03132631d0c0']

Current Path: /home/ubuntu/Documents/python4ds/.git/objects/61
Directories: []
Files: ['e2c7700fbd18cbc306cdf7202576767b749a4a', '520e835b8ef684b8ed5f5fb264992410e69fc2']

Current Path: /home/ubuntu/Documents/python4ds/.git/

Current Path: /home/ubuntu/Documents/python4ds/.git/objects/4d
Directories: []
Files: ['8c57ee321a951b6431d35f8b768dfe8176335a']

Current Path: /home/ubuntu/Documents/python4ds/.git/objects/8c
Directories: []
Files: ['1b3dbbab0120d8aedc9ae656f90f9123e5fe87']

Current Path: /home/ubuntu/Documents/python4ds/.git/objects/0a
Directories: []
Files: ['dfe1e6d81ead32f10115a8d59fa0d98ca54def']

Current Path: /home/ubuntu/Documents/python4ds/.git/objects/da
Directories: []
Files: ['08c3c7e382c41f869a859f1bcd24814207686b']

Current Path: /home/ubuntu/Documents/python4ds/.git/objects/5f
Directories: []
Files: ['575976f2cd208cdf7421d63534bd7bd1bc6998']

Current Path: /home/ubuntu/Documents/python4ds/.git/objects/d1
Directories: []
Files: ['98cdba65c65aa69d91c6c8b2164d5f7563c73a', 'dcc0dfb7925ce5537a3e14396ad20225dcaa1d']

Current Path: /home/ubuntu/Documents/python4ds/.git/objects/44
Directories: []
Files: ['a19d8c69c6aa542ccaabd2d47767722bc52f5b']

Current Path: /home/ubuntu/Documents/python4ds/.

This also helps in locating some file deep within a directory

### Accessing Environment variables

In [96]:
os.environ.get('HOME')

'/home/ubuntu'

In [74]:
os.environ.get('PATH')

'/home/ubuntu/miniconda/envs/av/bin:/home/ubuntu/miniconda/envs/av/bin:/home/ubuntu/miniconda/bin:/home/ubuntu/.nvm/versions/node/v6.11.1/bin:/home/ubuntu/bin:/home/ubuntu/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/db/bin:/usr/lib/jvm/java-8-oracle/jre/bin'

### Working with Path os.path

In [97]:
filename = 'test.txt'
filepath = os.path.join(os.environ.get('HOME'), filename) #handles slashes
print(filepath)

/home/ubuntu/test.txt


In [98]:
print('Base name:', os.path.basename('/home/ubuntu/test.txt')) #final component
print('Directory name:', os.path.dirname('/home/ubuntu/test.txt'))
if not os.path.exists(filepath):
    print('Path does not exist')

Base name: test.txt
Directory name: /home/ubuntu
Path does not exist


In [101]:
print(os.path.split(filepath)) #Splits into head, tail(after final slash)    
print(os.path.isdir('./.git'))
print(os.path.isfile('./.git'))
print(os.path.isfile('./.ipynb_checkpoints/python_basic-checkpoint.ipynb')) #False if doesnt exist
print(os.path.splitext('./av.yml')) #split into root and extension

('/home/ubuntu', 'test.txt')
True
False
True
('./av', '.yml')


## Collections

In [89]:
from collections import defaultdict, Counter

### 1. defaultdict

In [80]:
d = defaultdict(int) # assigns, returns default value when key is absent

In [85]:
print(d[2])

0


In [86]:
d[100] += 2

In [87]:
print(d)

defaultdict(<class 'int'>, {9: 1, 2: 0, 100: 2})


### 2. Counter

In [91]:
s = 'The new fox jumps over the mountain. The new car is broken.'

In [99]:
Counter(s.split())

Counter({'The': 2,
         'new': 2,
         'fox': 1,
         'jumps': 1,
         'over': 1,
         'the': 1,
         'mountain.': 1,
         'car': 1,
         'is': 1,
         'broken.': 1})

In [100]:
c = Counter(s.split())

In [101]:
c.most_common(3)

[('The', 2), ('new', 2), ('fox', 1)]

## Map, reduce, filter

### 1. Map
- Apply the given function to all the elements in an iterable object i.e list, tuple, etc.
- Syntax: ```map(function_to_apply, collection_of_inputs)```
- Output is a **map object**, which is an iterator over the results

**Example 1: Find area of a number of circles given their radii**

In [2]:
import math
def area(r):
    return math.pi*(r**2)
radii = [0.5, 2, 1.3, 1.1, 3.5]

In [4]:
# Method 1 - Direct method
# Method 2 - Using map
map(area, radii) # map object - an iterator

<map at 0xb14dd80c>

In [5]:
list(map(area, radii))

[0.7853981633974483,
 12.566370614359172,
 5.3092915845667505,
 3.8013271108436504,
 38.48451000647496]

**Example2: Given cities' temperatures in degrees celcius, convert them into Fahrenheit**

In [7]:
temp = [('Berlin', 29), ('Paris', 22), ('Mumbai', 33), ('New York', 17), ('Buenos Aries', 19)]

# F = (9/5)C + 32
c_to_f = lambda data : (data[0], (9/5) * data[1] + 32)

In [8]:
list(map(c_to_f, temp))

[('Berlin', 84.2),
 ('Paris', 71.6),
 ('Mumbai', 91.4),
 ('New York', 62.6),
 ('Buenos Aries', 66.2)]

Instead of list of inputs, we can also use **list of functions !**

In [10]:
def square(x):
    return x ** 2
def incr_three(x):
    return x + 3

funcs = [square, incr_three]
list(map(lambda x: x(5), funcs))

[25, 8]

In [11]:
for i in range(5):
    output = list(map(lambda x: x(i), funcs))
    print(output)

[0, 3]
[1, 4]
[4, 5]
[9, 6]
[16, 7]


### 2. filter
- To select only certain pieces of data from a collection
- Returns a list of elements for which a function returns True.
- Syntax: ```filter(function, iterable)```
- **Filters out** the data you do not need
- Output is again a filter object - iterator over the results

In [22]:
# Example1: Find values above average
import statistics
data =  [-1.5, 2.1, 0.1, -0.9, -0.1, 5.2]
avg = statistics.mean(data)

In [23]:
avg

0.8166666666666667

In [24]:
list(filter(lambda x: x > avg, data))

[2.1, 5.2]

In [26]:
# Values below average
list(filter(lambda x : x < avg, data))

[-1.5, 0.1, -0.9, -0.1]

In [29]:
# Example2: Remove empty data
countries = ["", "Argentina", "Ecuador", "", "Brazil", "Chile", "", "", "Colombia", "", "Venezuela"]
list(filter(None, countries))

['Argentina', 'Ecuador', 'Brazil', 'Chile', 'Colombia', 'Venezuela']

- Here we did not pass any function to test on, so the values itself will be tested for True/False.
- Since Empty strings are treated as False in Python (refer False values above), they'll be filtered out.

### 3. Reduce
- Used to apply a rolling computation to sequential pairs of values in a list.
- Given data = [a1, a2, ..., an] and function f, reduce does f(f(f(a1,a2),a3),...an)
    * Step 1: val1 = f(a1, a2)
    * Step 2: val2 = f(val1, a3)
    * ...
    * Final step: valn-1 = f(valn-2, an)
    * Return valn-1


In [31]:
# Example1: Find product of elements in list
values = [4, 12, 19, 3, 21]
# Normal Way:
prod = 1
for val in values:
    prod = prod * val
print(prod)    

57456


In [35]:
# Using reduce
from functools import reduce
reduce((lambda x,y: x*y), values)

57456

**BDFL:** "*use functools.reduce if you really need it; however, 99% of the time an explicit for loop is more readable.*"