########################### 3.1 Data Structures and Sequences #############################################

Tuple

A tuple is a fixed-length, immutable sequence of Python objects

Unpacking tuples

Try to assign to a tuple-like expression of variables, Python will attempt to unpack the value on the righthand side of the equals sign

In [42]:
tup = 4, 5, 6
# tup

### convert any sequence or iterator to a tuple by invoking tuple:

tuple([3,5,7])

tup = tuple('satish')
tup
tup[2]

### an object inside a tuple is mutable, such as a list and we can modify it in-place:
tup = tuple(['foo', [1, 2], True])
tup[1].append([1,3,4])
tup[1]

### Multiplying a tuple by an integer, as with lists

('foo', 'too') * 4

### Unpacking Tuple
(a,b,(c,d,f)) = tup[1]
f

seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
# seq

# for a,b,c in seq:
#     print('a={0}, b={1}, c={2}'.format(a,b,c))
    
a,b = 3,4
a

b,a = a,b
a

values = 1,2,3,4,5,6

a,b,*rest = values
a
rest

a,b,*_ = values
_

a = (1, 2, 2, 2, 3, 4, 2)
a
a.count(2)

4

List

In contrast with tuples, lists are variable-length and their contents can be modified in-place. define them using square brackets [] or using the list type function.
Elements can be appended to the end of the list with the append method
Using insert you can insert an element at a specific location in the list
The inverse operation to insert is pop, which removes and returns an element at a particular index.
Elements can be removed by value with remove, which locates the first such value and removes it from the last.
Can check if a list contains a value using the 'in' keyword.
Adding two lists together with + concatenates them.
If a list already defined, you can append multiple elements to it using the extend method.

Sorting

We can sort a list in-place (without creating a new object) by calling its sort function.
It has ability to pass a secondary sort key. i.e. a function that produces a value to use to sort the objects
e.g.sort a collection of strings by their lengths.

Binary search and maintaining a sorted list

The built-in bisect module implements binary search and insertion into a sorted list. bisect.bisect finds the location where an element should be inserted to keep it sorted, while bisect.insort actually inserts the element into that location.

Slicing

You can select sections of most sequence types by using slice notation, which in its basic form consists of start:stop passed to the indexing operator [].

In [108]:
a_list = [2, 3, 7, None]
tup = ('foo', 'bar', 'baz')
b_list = list(tup)
b_list

b_list[1] = 'poo'
b_list

gen = range(10)
gen
list(gen)

b_list.append('hey') ### Elements can be appended to the end of the list
b_list

b_list[2] = 'kay' ### Can replace the element of the list
b_list

b_list.insert(2, 'may') ### Using insert you can insert an element at a specific location in the list
b_list

b_list.pop(2) ### The inverse operation to insert is pop, which removes and returns an element at a particular index
b_list

b_list.append('poo')
b_list.remove('poo') ### Elements can be removed by value with remove, which locates the first such value and removes 
b_list               ### it from the last.

'hey' in b_list ### Can check if a list contains a value using the 'in' keyword.

a_list + b_list ### Adding two lists together with + concatenates them

a_list.extend([6, 7, [34, 67]]) ### If a list already defined, then can append multiple elements to it using the 
a_list                          ### extend method

################################################ Sorting ######################################################

a = [7, 2, 5, 1, 3]
a.sort() ### can sort a list in-place (without creating a new object) by calling its sort function
a

b = ['saw', 'small', 'He', 'foxes', 'six']
b.sort(key=len) ### sort a collection of strings by their lengths
b

############################### Binary search and maintaining a sorted list ######################################

import bisect

c = [1, 2, 2, 2, 3, 4, 7]
bisect.bisect(c,8) ### bisect.bisect finds the location where an element should be inserted to keep it sorted
bisect.insort(c,5) ### bisect.insort actually inserts the element into that location & keeps list sorted.
c ### 1, 2, 2, 2, 3, 4, 5, 7]

bisect.bisect_left(c,6)
bisect.bisect_right(c,9)


################################################ Slicing ######################################################

seq = [7, 2, 3, 7, 5, 6, 0, 1]
seq[2:5]

seq[2:6] = [9,8,11,43,55,2,5] ### Slices can also be assigned to with a sequence
seq ### [7, 2, 9, 8, 11, 43, 55, 2, 5, 0, 1]

# seq[3:] ### the element at the start index is included, the stop index is not included
# seq[:3] ### the element at the stop index is included, the start index is not included
# seq[4:7]
# seq[-2:] ### Negative indices slice the sequence relative to the end
# seq[:-3]
# seq[-6:-3]

[11, 43, 55]

Built-in Sequence Functions

1) enumerate

Python has a built-in function called enumerate, which keep track of the index of the current item while iterating over a sequence and returns a sequence of (i, value) tuples.

2) Zip

zip “pairs” up the elements of a number of lists, tuples, or other sequences to create a list of tuples.common use of zip is simultaneously iterating over multiple sequences, possibly also combined with enumerate.
Given a “zipped” sequence, zip can be applied to “unzip” the sequence.

3) Reverse

Reversed iterates over the elements of a sequence in reverse order.

4) Dict

Dict is likely the most important built-in Python data structure,common name for it is hash map or associative array.
It is a flexibly sized collection of key-value pairs, where key and value are Python objects. One approach for creating one is to use curly braces {} and colons to separate keys and values. Access, insert, or set elements of dict using the same syntax as for accessing elements of a list or tuple. 

The values of a dict can be any Python object, the keys generally have to be immutable objects like scalar types (int, float, string) or tuples (all the objects in the tuple need to be immutable, too). The technical term here is hashability.

5) Set

A set is an unordered collection of unique elements. It is like dicts, but keys only, no values. Sets support mathematical set operations like union, intersection, difference, and symmetric difference.

6) List comprehensions

List comprehensions allow to concisely form a new list by filtering the elements of a collection, transforming the elements passing the filter in one concise expression.

In [121]:
################################################ enumerate ######################################################

some_list = ['foo', 'bar', 'baz']
mapping = {}
for i, value in enumerate(some_list):
    mapping[value] = i
mapping

################################################ Zip ######################################################

seq1 = ['foo', 'bar', 'baz']
seq2 = ['one', 'two', 'three']
zipped = zip(seq1, seq2)
list(zipped)

seq3 = ['34', '33','45','55']
zipped = zip(seq1, seq2, seq3)
list(zipped)

### common use of zip is simultaneously iterating over multiple sequences, possibly also combined with enumerate
# for i, (a, b, c) in enumerate(zip(seq1, seq2, seq3)):
#     print('{0}: {1}, {2}, {3}'.format(i, a, b, c))

### Given a “zipped” sequence, zip can be applied to “unzip” the sequence.
pitcher =  [('Nolan', 'Ryan'), ('Roger', 'Clemens'),('Schilling', 'Curt')]
first_name, last_name = zip(*pitcher)
first_name
last_name

################################################ Reverse ######################################################

list(reversed(range(10))) ### Reversed iterates over the elements of a sequence in reverse order.

################################################ Dict ######################################################

d1 = {'a':'some values', 'b':[1,2,3,4]}


d1[8] = "^%SHPLP23"

d1[8]

8 in d1 ###check if a dict contains a key using same syntax used for checking whether a list or tuple contains a value

# d1.pop(3) ### delete values using the pop method 

d1[3] = "$ & #"
 
del d1[3] ### delete values using the del keyword

d1.update({'d' : 'foo', 'c' : 12}) ### merge one dict into another using the update method

d1.keys() ### The keys and values method gives iterators of the dict’s keys and values
d1.values()

list(d1.values())

######################################## Creating dicts from sequences ##############################################

mapping = dict(zip(range(5), reversed(range(5)))) ### the dict function accepts a list of 2-tuples
mapping

### categorizing a list of words by their first letters as a dict of lists

words = ['apple', 'bat', 'bar', 'atom', 'book']

# by_letter = {}
# for word in words:
#     letter = word[0]
#     if letter not in by_letter:
#         by_letter[letter] = [word]
#     else:
#         by_letter[letter].append(word) 
# by_letter

### The setdefault dict method is for precisely this purpose 
# by_letter = {}
# for word in words:
#     letter = word[0]
#     by_letter.setdefault(letter,[]).append(word)
# by_letter


### The built-in collections module has a useful class defaultdict, which makes this even easier.

from collections import defaultdict
by_letter = defaultdict(list)
for word in words:
    by_letter[word[0]].append(word)
by_letter

hash('apple') ###check whether an object is hashable (can be used as a key in a dict) with the hash function
hash((1,2,(3,5)))
# hash((1,2,[3,5,7])) ### fails because lists are mutable

d = {}
d[tuple([3,5,7])] = 8 ### To use a list as a key, then convert it to a tuple, whichvia the set function can be hashed
d

################################################ Set ######################################################

# set([2, 2, 2, 1, 3, 3]) ### A set is an unordered collection of unique elements can be created via the set function

{2, 2, 2, 1, 3, 3} ### can be created via a set literal with curly braces

a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}

a.union(b) ### The union of these two sets is the set of distinct elements occurring in either set.
a | b

a.intersection(b) ### The intersection contains the elements occurring in both sets
a & b

c = a.copy()
# c |= b ### Set the contents of a to be the union of the elements in c and b
c &= b ### Set the contents of a to be the intersection of the elements in c and b
c

c.issubset(a) ### To check if a set is a subset of another set
a.issuperset(c)### To check if a set is a superset of another set
c == {5,4,3} ### Sets are equal if and only if their contents are equal


c = list(tuple(c))### Like dicts, set elements are immutable, To have list-like elements, set must convert to a tuple
c

######################################## List comprehensions ##############################################

strings = ['a', 'as', 'bat', 'car', 'dove', 'python', 'Jason', '#$%^&!*@'] ### For given a list of strings
# [x.upper() for x in strings if len(x) > 2] ### filter out strings with length 2 or more and convert them to uppercase

######################################## Set and dict comprehensions ##############################################

dict_comp = {val: index for val, index in enumerate(strings)} ### dict comprehension to map strings to their locations
dict_comp

# string_len = {len(x) for x in strings}### a set containing just the lengths of the strings contained in the collection
# string_len

######################################## Nested list comprehensions ##############################################

### we have a list of lists containing names
all_data = [['John', 'Emily', 'Michael', 'Mary', 'Steven'],['Maria', 'Juan', 'Javier', 'Natalia', 'Pilar']]

### to get a single list containing all names with two or more 'e's in them
# interested_names = []
# for names in all_data:
#     for name in names:
#         if name.count('e') >= 2:
#             interested_names.append(name)
# interested_names            

### wrap this whole operation up in a single nested list comprehension
results = [name for names in all_data for name in names if name.count('a') >=2]
results


some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
# flatten = [[x for x in tup] for tup in some_tuples]
flatten = [x for tup in some_tuples for x in tup]
flatten ### flatten is list of tuples of integers into list of integers

[1, 2, 3, 4, 5, 6, 7, 8, 9]

###################################### 3.2 Functions ##############################################################

To repeat the same or very similar code more than once, it is worth writing a reusable function. Functions are declared with the def keyword and returned from with the return keyword.

There is no issue with having multiple statements return. Python reaches the end of a function without encountering a return statement, None is returned.

Each function can have positional arguments and keyword arguments. Keyword arguments are used to specify default values or optional arguments. The main restriction on function arguments is that the keyword arguments must follow the positional arguments

Functions can access variables in two different scopes: global and local. alternative name describing a variable scope in Python is a namespace. Any variables that are assigned within a function by default are assigned to the local namespace. The local namespace is created when the function is called and immediately populated by the function’s arguments. After the function is finished, the local namespace is destroyed.

1) Anonymous (Lambda) Functions

Python has support for so-called anonymous or lambda functions, which are a way of writing functions consisting of a single statement, the result of which is the return value. They are defined with the lambda keyword.

2) Generators

Having a consistent way to iterate over sequences, like objects in a list or lines in a file. This is accomplished by means of the iterator protocol, a generic way to make objects iterable.
A generator is a concise way to construct a new iterable object. Whereas normal functions execute and return a single result at a time, generators return a sequence of multiple results lazily, pausing after each one until the next one is requested.

3) itertools module

The standard library itertools module has a collection of generators for many common data algorithms.

4) Errors and Exception Handling

Handling Python errors or exceptions gracefully is an important part of building robust programs.

In [195]:
###################################### Functions ##############################################################

def my_function(x, y, z=1.5): ### x and y are positional arguments and z is a keyword argument.
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)

# my_function(5, 6, z=0.7)
# my_function(3.14, 7, 3.5)
# my_function(10, 20)    
# my_function(x=5, y=6, z=7) ### the keyword arguments must follow the positional arguments
my_function(y=6, x=5, z=7) ### 

#### Local Namespace example

def func(): ### When is called func()
    a = [] ### empty list is created 
    for i in range(5):
        a.append(i) ### five element is appended in list. 
# print(a) ### print the list 'a'
### a is destroyed when the function exits

a = []
def func():
    for i in range(5):
        a.append(i)
a

######################################## function return Multiple values ####################################

def func():
    a = 3
    b = 5
    c = 8
    return a, b, c ### the function is returning one object as a tuple with 3 elements
func()

def func():
    a = 3
    b = 5
    c = 8
    return {'a': a, 'b': b, 'c': c} ### the function is returning dictionary object with 3 elements
func()

######################################## function are objects ####################################

### Data given below is messy, use with the standard library module re to clean up

states = [' Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda','south carolina##', 'West virginia?']

import re

def clean_string(strings):
    result=[]
    for value in strings:
        value = value.strip() ### stripping whitespace
        value = re.sub('[#?!]', '', value) ### removing punctuation symbols
        value = value.title() ###  standardizing on proper capitalization
        result.append(value)
    return result
clean_string(states)

######################################## Anonymous (Lambda) Functions ####################################

def apply_to_list(some_list, f):
    return [f(x) for x in some_list]
    
ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)

### to sort a collection of strings by the number of distinct letters in each string
strings = ['foo', 'card', 'bar', 'aaaa', 'abab']
strings.sort(key = lambda x: len(set(list(x)))) ### pass a lambda function to the list’s sort method
strings

######################################## Generator #######################################################

def squares(n=10):
    for i in range(1, n+1):
        yield i ** 2 ### yield keyword is used to create a generator
squares()  ### when call the generator, no code is immediately executed

# for x in squares(): ### request elements from the generator that it begins executing its code
#     print(x, end=' ')
    
######################################## Generator expresssions ############################################

gen = (x ** 2 for x in range(10))
gen
# sum(x ** 2 for x in range(10))
dict((x, x ** 2) for x in range(6))

######################################## itertools module ############################################

### To group consecutive elements in the sequence by return value of the function.

import itertools

names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']
first_letter = lambda x: x[0]

# for letter, name in itertools.groupby(names, first_letter): ### groupby(iterable[, keyfunc])
#     print(letter, list(name))

######################################## Errors and Exception Handling ############################################    

###  Suppose we wanted a version of float that fails gracefully, returning the input argument

def attempt_float(x):
    try:
        return float(x)
    except: ### The code in the except part of the block will be executed only if float(x) raises an exception
        return x
    finally: ### use of finally to suppress an exception and execute some code in try block.
        print('It is done')

attempt_float(1.23) ### return 1.23
# attempt_float('something') ### return something

It is done


1.23

###################################### 3.3 Files and the Operating System #########################################

To open a file for reading or writing, use the built-in open function with either a relative or absolute file path.

For readable files the most commonly used methods are read, seek, and tell.
To write text to a file, use the file’s write or writelines methods.

1) Bytes and Unicode with Files

UTF-8 is a variable-length Unicode encoding, so when requested some number of characters from the file, Python reads enough bytes from the file to decode that many characters.
Text mode, combined with the encoding option of open, provides a convenient way to convert from one Unicode encoding to another.


In [256]:
###################################### 3.3 Files and the Operating System #########################################

path = 'C:/Users/Satish/python_files/PythonforDataAnalysis/segismundo.txt'
# f= open(path)
# for line in f:
#     print(line)

f.close() ### When use open to create file object Closing the file releases its resources back to the operating system

lines = [x.rstrip() for x in open(path)]
lines

with open(path) as f: ### to automatically close the file f when exiting use the 'with' block.
    lines = [x.strip() for x in f]
lines

f = open(path)
f.read(10) ### read returns a certain number of characters from the file
           ### constitutes of a “character” is determined by the file’s encoding (e.g., UTF-8) 

f.tell()   ### tell gives you the current position   


# f2 = open(path, 'rb') ### binary mode
# f2.read(10)
# f2.close()

import sys
sys.getdefaultencoding() ### To check the default encoding in the sys module
# 'utf-8'
f.seek(3) ### seek changes the file position to the indicated byte in the file
f.read(6) ### here read is reading from 3rd postion as seek changes the file position to 3
f.close()

### To write text to a file, using the file’s write or writelines methods. i.e. create a version of prof_mod.py 
### with no blank lines

with open('tmp.txt', 'w') as handle:
    handle.writelines(x for x in open(path) if len(x) > 1)

with open('tmp.txt') as f:
    lines = f.readlines()
lines

######################################## Bytes and Unicode with Files ############################################

with open(path) as f: ### file contains non-ASCII characters with UTF-8 encoding
    chars = f.read(10)
chars

with open(path, 'rb') as f: ### the file open in 'rb' mode read requests exact numbers of bytes
    data = f.read(10)
data
# data.decode('utf8') ### Depending on the text encoding, we can decode the bytes to a str object


### Text mode combined with the encoding option of open, provides a way to convert from one Unicode encoding to another
sink_path = 'sink.txt'
with open(path) as source:
    with open(sink_path, 'xt', encoding='iso-8859-1') as sink:
        sink.write(source.read())

with open(sink_path, encoding='iso-8859-1') as f:
    print(f.read(10))

'Sueña el 
