# Data Visualization
# BTech Computer Science Stream , January 2025
# Week 2 Python Language Basics - Demonstration Code
# Name:Satwik Hegde(PG-08), Reg Number - 240911676 , Date: 24/12/2024
# This Notebook demonstrates Python’s workhorse data structures: tuples, lists, dicts, and sets and discuss creating your own reusable Python functions

Following naming conventions are used for Python's data structures

tuple -tup
Sequence-seq
list-variablename_list
dicts-dict  
sets-variablename_set




# **Data Structures and Sequences**

**Tuple**

A tuple is a fixed-length, immutable sequence of Python objects.

In [None]:
tup = (4, 5, 6)
tup

(4, 5, 6)

Defining tuples in more complicated expressions, it’s often necessary to enclose the values in parentheses, as in this example of
creating a tuple of tuples

In [None]:
tup = (4, 5, 6), (7, 8, 5)
tup

((4, 5, 6), (7, 8, 5))

In [None]:
### convert any sequence or iterator to a tuple by invoking tuple
tup = tuple('string')
tup

('s', 't', 'r', 'i', 'n', 'g')

In [None]:
### Elements can be accessed with square brackets [] as with most other sequence types. Sequences are 0-indexed in Python
tup[0]

's'

In [None]:
nested_tup = (4, 5, 6), (7, 8),(1,2,3)
nested_tup
nested_tup[2]
#nested_tup[1]

(1, 2, 3)

***iNote*** : the objects stored in a tuple may be mutable themselves, once the tuple is created it’s not possible to modify which object is stored in each slot.

In [None]:
tup = tuple(['foo', [1, 2, 3], True])
#tup[2] = False
tup

('foo', [1, 2, 3], True)

In [None]:
### If an object inside a tuple is mutable, such as a list, you can modify it in place
tup[1].append(4)
tup

('foo', [1, 2, 3, 4], True)

In [None]:
### You can concatenate tuples using the + operator to produce longer tuples
(4, None, 'foo') + (6, 0) + ('bar',)+ (1,2,3)

(4, None, 'foo', 6, 0, 'bar', 1, 2, 3)

In [None]:
### Multiplying a tuple by an integer, as with lists, has the effect of concatenating together that many copies of the tuple
('foo', 'bar') * 4

('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

In [None]:
### Unpacking tuples: If you try to assign to a tuple-like expression of variables, Python will attempt to unpack the value on the
### righthand side of the equals sign

tup = (4, 5, 6)
a, b, c = tup
c

6

In [None]:
### Sequences with nested tuples can be unpacked
tup = 4, 5, (6, 7)
a, b, (c, d) = tup
a

4

In [None]:
### Using this functionality you can easily swap variables
a, b = 1, 2
a
b
b, a = a, b
print (a , b)

2 1


In [None]:
### A common use of variable unpacking is iterating over sequences of tuples or lists
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
for a, b, c in seq:
    print(f'a={a}, b={b}, c={c}')

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


In [None]:
### more advanced tuple unpacking to help with situations where you may want to “pluck” a few elements from the beginning of a tuple.
### This uses the special syntax *rest, which is also used in function signatures to capture an arbitrarily long list of positional arguments
values = 1, 2, 3, 4, 5
a, b, *rest = values
a
b
rest

[3, 4, 5]

In [None]:
### This rest bit is sometimes something you want to discard; there is nothing special about the rest name.
### As a matter of convention, many Python programmers will use the underscore (_) for unwanted variables

a, b, *_ = values
a

1

**Tuple methods**

Since the size and contents of a tuple cannot be modified, it is very light on instance methods.

In [None]:
### A useful method (also available on lists) is count, which counts the number of occurrences of a value.
a = (1, 2, 2, 2, 3, 4, 2)
a.count(2)

4

# **Lists**
Lists are variable-length and their contents can be modified in-place.
You can define them using square brackets [] or using the list type function.

In [None]:
a_list = [2, 3, 7, None]
a_list

tup = ("foo", "bar", "baz")
tup
b_list = list(tup)
b_list
b_list[1] = "peekaboo"
b_list

['foo', 'peekaboo', 'baz']

***iNote*** :Lists and tuples are semantically similar (though tuples cannot be modified) and can be used interchangeably in many functions.

In [None]:
### The list function is frequently used in data processing as a way to materialize an iterator or generator expression
gen = range(10)
gen
list(gen)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

**Adding and removing elements**

In [None]:
### Elements can be appended to the end of the list with the append method
b_list.append("dwarf")
b_list

['foo', 'peekaboo', 'baz', 'dwarf']

In [None]:
### Using insert you can insert an element at a specific location in the list, the insertion index must be between 0 and the length of the list, inclusive
b_list.insert(1, "red")
b_list


['foo', 'red', 'peekaboo', 'baz', 'dwarf']

***iNote*** : ***insert*** is computationally expensive compared with append, because references to subsequent elements have to be shifted internally to make room for the new element.

If you need to insert elements at both the beginning and end of a sequence, you may explore ***collections.deque***, a double-ended queue,for this purpose.

In [None]:
### The inverse operation to insert is pop, which removes and returns an element at a particular index
b_list.pop(2)
b_list

['foo', 'red', 'baz', 'dwarf']

In [None]:
### Elements can be removed by value with remove, which locates the first such value and removes it from the list
##b_list.append("foo")
b_list.remove("foo")
b_list

['foo', 'red', 'baz', 'dwarf', 'foo']

***iNote*** :If performance is not a concern, by using ***append*** and ***remove***, you can use a Python list as a set-like data structure

In [None]:
### We can Check if a list contains a value using the in keyword
"dwarf" in b_list

True

In [None]:
### The keyword not can be used to negate in
"dwarf" not in b_list

False

***iNote*** : Checking whether a list contains a value is a lot slower than doing so with dicts and sets , as Python makes a linear scan
across the values of the list, whereas it can check the others (based on hash tables) in constant time.

**Concatenating and combining lists**

In [None]:
### Similar to tuples, adding two lists together with + concatenates them
[4, None, "foo"] + [7, 8, (2, 3)]

[4, None, 'foo', 7, 8, (2, 3)]

In [None]:
### You can append multiple elements to a list already defined, using the extend method
x = [4, None, "foo"]
x.extend([7, 8, (2, 3)])
x

[4, None, 'foo', 7, 8, (2, 3)]

**Sorting**

In [None]:
### sort a list in-place (without creating a new object) by calling its sort function
a = [7, 2, 5, 1, 3]
a.sort()
a

[1, 2, 3, 5, 7]

In [None]:
### sort using secondary sort key
b = ["saw", "small", "He", "foxes", "six"]
b.sort(key=len)
#b.sort(key=lambda x: x[1])
b

['He', 'saw', 'six', 'small', 'foxes']

**Slicing**

In [None]:
### select sections of most sequence types by using slice notation, which in its basic form consists of start:stop passed to the indexing operator []
seq = [7, 2, 3, 7, 5, 6, 0, 1]
seq[1:5]

[2, 3, 7, 5]

In [None]:
### Slices can also be assigned to with a sequence
seq[3:5] = [6, 3]
seq

[7, 2, 3, 6, 3, 6, 0, 1]

In [None]:
### Either the start or stop can be omitted, in which case they default to the start of the sequence and the end of the sequence, respectively
seq[:5]
#seq[3:]


[6, 3, 6, 0, 1]

In [None]:
### Negative indices slice the sequence relative to the end
#seq[-4:]
seq[-6:-2]

[3, 6, 3, 6]

**Slicing semantics**

In [None]:
### A step can also be used after a second colon to, say, take every other element
seq[::2]

[7, 3, 3, 0]

In [None]:
### pass -1, which has the useful effect of reversing a list or tuple
seq[::-1]

[1, 0, 6, 3, 6, 3, 2, 7]

# **Dictionary** : dict

A ***dict*** is an unordered collection of key-value pairs,where key and value are Python objects.
Each key is associated with a value so that a value can be conveniently retrieved, inserted, modified, or deleted given a particular key.


In [None]:
### Creating one dict is to use curly braces {} and colons to separate keys and values.
empty_dict = {}
#empty_dict
d1 = {"a": "some value", "b": [1, 2, 3, 4]}
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

In [None]:
### You can access, insert, or set elements using the same syntax as for accessing elements of a list or tuple
d1[7] = "an integer"
d1
d1["b"]

[1, 2, 3, 4]

In [None]:
### You can check if a dict contains a key using the same syntax used for checking whether a list or tuple contains a value
"c" in d1

False

In [None]:
### You can delete values either using the del keyword or the pop method (which simultaneously returns the value and deletes the key)
d1[5] = "some value"
d1
d1["dummy"] = "another value"
d1
del d1[5]
d1
ret = d1.pop("dummy")
ret
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

In [None]:
### The keys and values method give you iterators of the dict’s keys and values, respectively.
### The order of the keys depends on the order of their insertion, and these functions output the keys and values in the same respective order
list(d1.keys())
#list(d1.values())

['a', 'b', 7]

In [None]:
list(d1.items())

[('a', 'some value'), ('b', [1, 2, 3, 4]), (7, 'an integer')]

In [None]:
### You can merge one dict into another using the update method
d1.update({"b": "foo", "c": 12})
d1

{'a': 'some value', 'b': 'foo', 7: 'an integer', 'c': 12}

**Creating dicts from sequences**

In [None]:
### two sequences that you want to pair up element-wise in a dict
list(range(5))
list(reversed(range(5)))
tuples = zip(range(5), reversed(range(5)))
dict(tuples)

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

**Default values**

In [17]:
some_dict ={'a' : 'some value', 'b' : [1, 2, 3, 4]}
key='c'  # Replace with the key you want to check
default_value = 'not found' # Replace with your desired default value
if key in some_dict:
  value = some_dict[key]
else:
  value = default_value

print(value)

not found


In [22]:
### the dict methods get and pop can take a default value to be returned

key='c'
value = some_dict.get(key, default_value)

print(value)

not found


***get*** by default will return None if the key is not present, while ***pop*** will
raise an exception. With setting values, it may be that the values in a dict
are another kind of collection, like a list. For example, you could imagine
categorizing a list of words by their first letters as a dict of lists

In [3]:

words = ["apple", "bat", "bar", "atom", "book"]
#words
by_letter = {}

for word in words:
    letter = word[0]
    if letter not in by_letter:
        by_letter[letter] = [word]
    else:
        by_letter[letter].append(word)

by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

In [24]:
### The setdefault dict method can be used to simplify this workflow. The preceding for loop can be rewritten as:

for word in words:
  letter = word[0]

by_letter.setdefault(letter, []).append(word)

by_letter

defaultdict(list,
            {'a': ['apple', 'atom'],
             'b': ['bat', 'bar', 'book', 'book', 'book']})

***iNote*** :The built-in ***collections*** module has a useful class, ***defaultdict***, which makes this even easier. To create one, you pass a type or function for
generating the default value for each slot in the dict.

In [25]:
from collections import defaultdict

by_letter = defaultdict(list)
for word in words:
    by_letter[word[0]].append(word)

by_letter

defaultdict(list, {'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']})

**Valid dict key types**

While the values of a dict can be any Python object, the keys generally have to be immutable objects like
scalar types (int, float, string) or tuples (all the objects in the tuple need to be immutable, too). The technical term here is hashability.
You can check whether an object is hashable (can be used as a key in a dict) with the hash function.

In [None]:
hash("string")
hash((1, 2, (2, 3)))
#hash((1, 2, [2, 3])) # fails because lists are mutable

-9209053662355515447

To use a list as a key, one option is to convert it to a tuple, which can be hashed as long as its elements also can

In [None]:
d = {}
d[tuple([1, 2, 3])] = 5
d

{(1, 2, 3): 5}

# **Set**
A set is an unordered collection of unique elements. You can think of them like dict keys, but keys only, no values.
 A set is a collection which is unordered, unchangeable, and unindexed.

 *Note:* Set items are unchangeable, but you can remove items and add new items.

In [None]:
### A set can be created in two ways: via the set function or via a set literal with curly braces
set([2, 2, 2, 1, 3, 3])
#{2, 2, 2, 1, 3, 3}

{1, 2, 3}

In [None]:
a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}
a
b

{3, 4, 5, 6, 7, 8}

In [None]:
### The union of these two sets is the set of distinct elements occurring in either set.
### This can be computed with either the union method or the | binary operator

#a.union(b)
a | b

{1, 2, 3, 4, 5, 6, 7, 8}

In [None]:
### The intersection contains the elements occurring in both sets.
### The & operator or the intersection method can be used

#a.intersection(b)
a & b

{3, 4, 5}

In [None]:
### replace the contents of the set on the left side of the operation with the result, a more efficient way for large sets.

c = a.copy()
c |= b
c
#d = a.copy()
#d &= b
#d

{1, 2, 3, 4, 5, 6, 7, 8}

Like a dict’s keys, a set’s elements generally must be **immutable**, and they must be **hashable** (which means that calling hash on a value
does not raise an exception). In order to store list-like elements (or other mutable sequences) in a set, you can convert them to tuples.

In [None]:
my_data = [1, 2, 3, 4]
my_set = {tuple(my_data)}
my_set

{(1, 2, 3, 4)}

In [None]:
### You can also check if a set is a subset of (is contained in) or a superset of (contains all elements of) another set
### Refer to book for list of Python set methods: add, clear, update etc.

a_set = {1, 2, 3, 4, 5}
{1, 2, 3, 7}.issubset(a_set)
#a_set.issuperset({1, 2, 3})

False

In [None]:
### Sets are equal if and only if their contents are equal

#{1, 2, 3} == {3, 2, 1}
{0, 2, 3} == {3, 2, 1}

False

**Some more functions**

In [None]:
sorted([7, 1, 2, 6, 0, 3, 2])
#sorted("horse race")

[0, 1, 2, 2, 3, 6, 7]

In [None]:
seq1 = ["foo", "bar", "baz"]
seq2 = ["one", "two", "three"]
zipped = zip(seq1, seq2)
list(zipped)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

In [None]:
seq3 = [False, True]
list(zip(seq1, seq2, seq3))

[('foo', 'one', False), ('bar', 'two', True)]

In [None]:

for index, (a, b) in enumerate(zip(seq1, seq2)):
    print(f"{index}: {a}, {b}")


0: foo, one
1: bar, two
2: baz, three


In [None]:
list(reversed(range(10)))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

**List, Set, and Dict Comprehensions**

List comprehensions are are a convenient and widely-used Python language
feature. They allow you to concisely form a new list by filtering the
elements of a collection, transforming the elements passing the filter in one concise expression.

syntax: [expr for val in collection if condition]

In [None]:
### filter out strings with length 2 or less and also convert them to uppercase

strings = ["a", "as", "bat", "car", "dove", "python"]
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

***iNote*** :A set comprehension looks like the equivalent list comprehension except
with curly braces instead of square brackets:

set_comp = {*expr* for value in collection if *condition*}

In [None]:
### set comprehension
unique_lengths = {len(x) for x in strings}
unique_lengths

{1, 2, 3, 4, 6}

In [None]:
### same as above using map function
set(map(len, strings))

{1, 2, 3, 4, 6}

In [None]:
### create a lookup map of these strings to their locations in the list
### strings = ["a", "as", "bat", "car", "dove", "python"]

loc_mapping = {value: index for index, value in enumerate(strings)}
loc_mapping

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

**Nested list comprehensions**

In [None]:
all_data = [["John", "Emily", "Michael", "Mary", "Steven"],
            ["Maria", "Juan", "Javier", "Natalia", "Pilar"]]
all_data

[['John', 'Emily', 'Michael', 'Mary', 'Steven'],
 ['Maria', 'Juan', 'Javier', 'Natalia', 'Pilar']]

In [None]:
### a single list containing all names with two or more a’s in them using for loop
names_of_interest = []
for names in all_data:
    enough_as = [name for name in names if name.count("a") >= 2]
    names_of_interest.extend(enough_as)
names_of_interest

['Maria', 'Natalia']

In [None]:
### a single nested list comprehension for the above
result = [name for names in all_data for name in names
          if name.count("a") >= 2]
result

['Maria', 'Natalia']

In [None]:
### “flatten” a list of tuples of integers into a simpl list of integers using list comprehension
### The for parts of the list comprehension are arranged according to the order of nesting, and any filter condition is put at the end

some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
flattened = [x for tup in some_tuples for x in tup]
flattened

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [None]:
### the order of the for expressions would be the same if you wrote a nested for loop instead of a list comprehension

flattened = []

for tup in some_tuples:
    for x in tup:
        flattened.append(x)
flattened

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [None]:
### a list comprehension inside a list comprehension, which is perfectly valid

[[x for x in tup] for tup in some_tuples]

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# **Functions**

Functions are the primary and most important method of code organization and reuse in Python.

Functions help make your code reusable and more readable by giving a name to a group of Python statements.
    
Functions are declared with the ***def*** keyword.

A function contains a block of code with an optional use of the with the return keyword. When a line with **return** is reached, the value or expression after return is sent to the context where the function was called

In [None]:
def my_function(x, y):
    return x + y

In [None]:
my_function(1, 2)
#result = my_function(1, 2)
#result

3

***iNote*** : There is no issue with having multiple return statements. If Python reaches the end of a function without encountering a return statement, None is returned automatically.

In [None]:
def function_without_return(x):
    print(x)

result = function_without_return("hello!")
print(result)

hello!
None


Each function can have **positional arguments** and **keyword argum**ents.

Keyword arguments are most commonly used to specify ***default values*** or ***optional arguments***.

In the preceding function,** x **and **y** are positional arguments while z is a keyword argument. This means that the function
can be called in any of these ways:

In [None]:
### function defenition

def my_function2(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)

In [None]:
### the function can be called in any of the following ways

#my_function2(5, 6, z=0.7)
#my_function2(3.14, 7, 3.5)
my_function2(10, 20)
#my_function2(x=10, y=20)  ### using keywords for passing positional arguments

45.0

**Namespaces, Scope, and Local Functions**

Functions can access variables created inside the function as well as those outside the function in higher (or even global) scopes. An alternative and more descriptive name describing a variable scope in Python is a namespace.

Any variables that are assigned within a function by default are assigned to the local namespace. The local namespace is created when the function is called and immediately populated by the function’s arguments.
After the function is finished, the local namespace is destroyed (with some exceptions that are outside the purview of this chapter).

In [None]:
def func():
  #global aa
  aa = []
  for i in range(5):
    aa.append(i)
    print(aa)

In [None]:
func()
#print(aa)

[0]
[0, 1]
[0, 1, 2]
[0, 1, 2, 3]
[0, 1, 2, 3, 4]


In [None]:
at = []
def func():
    for i in range(5):
        at.append(i)

In [None]:
### Each call to func will modify the list a

func()
at
#func()
#at

[0, 1, 2, 3, 4]

Assigning variables outside of the function’s scope is possible, but those variables must be declared explicitly either using the global the global or
nonlocal keywords.

Nonlocal allows a function to modify variables defined in a higher level scope that is not global.

In [None]:
a = None
def bind_a_variable():
    global a
    a = []
bind_a_variable()
print(a)

[]


**Functions are Objects**

Since Python functions are objects, many constructs can be easily expressed that are difficult to do in other languages.

Suppose we were doing some data cleaning and needed to apply a bunch of transformations to the following list of strings:

In [None]:
states = ["   Alabama ", "Georgia!", "Georgia", "georgia", "FlOrIda",
          "south   carolina##", "West virginia?"]

In [None]:
### use built-in string methods along with the re standard library module for regular expressions

import re

def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub("[!#?]", "", value)
        value = value.title()
        result.append(value)
    return result

In [None]:
clean_strings(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

In [None]:
### An alternative approach is to make a list of the operations you want to apply to a particular set of strings

def remove_punctuation(value):
    return re.sub("[!#?]", "", value)

clean_ops = [str.strip, remove_punctuation, str.title]

def clean_strings(strings, ops):
    result = []
    for value in strings:
        for func in ops:
            value = func(value)
        result.append(value)
    return result

In [None]:
clean_strings(states, clean_ops)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

In [None]:
### You can use functions as arguments to other functions like the built-in map function, which applies a function to a sequence operations
for x in map(remove_punctuation, states):
    print(x)

   Alabama 
Georgia
Georgia
georgia
FlOrIda
south   carolina
West virginia


# **Anonymous (Lambda) Functions**

Python has support for so-called *anonymous* or *lambda* functions, which are a way of writing functions consisting of a single statement, the result of
which is the return value. They are defined with the ***lambda*** keyword, which has no meaning other than “we are declaring an anonymous function”

In [None]:
def short_function(x):
    return x * 2

equiv_anon = lambda x: x * 2

***iNote*** : lambda functions are especially convenient in data analysis because, as you’ll see, there are many cases where data transformation functions will take functions as arguments.
It’s often less typing (and clearer) to pass a lambda function as opposed to writing a full-out function declaration or even assigning the lambda function to a local variable. For example

In [None]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)

[8, 0, 2, 10, 12]

In [None]:
strings = ["foo", "card", "bar", "aaaa", "abab"]

In [None]:
### the list of words is sorted based on the number of unique characters in each word
## lambda fn Counts the number of unique elements in the original element x
strings.sort(key=lambda x: len(set(x)))
strings

['aaaa', 'foo', 'abab', 'bar', 'card']

# **Generators**

Having a consistent way to iterate over sequences, like objects in a list or lines in a file, is an important Python feature.
This is accomplished by means of the iterator protocol, a generic way to make objects iterable. For example, iterating over a dict yields the dict keys:

In [None]:
some_dict = {"a": 1, "b": 2, "c": 3}
for key in some_dict:
    print(key)

An ***iterator*** is any object that will yield objects to the Python interpreter when used in a context like a for loop. Most methods expecting a list or list-like object will also accept any iterable object. This includes built-in methods such as min, max, and sum, and type constructors like list and tuple:

In [None]:
dict_iterator = iter(some_dict)
dict_iterator

In [None]:
list(dict_iterator)

A ***generator*** is a convenient way, similar to writing a normal function, to construct a new iterable object. Whereas normal functions execute and return a single result at a time, generators return a sequence of multiple results lazily, pausing after each one until the next one is requested.
To create a generator, use the ***yield*** keyword instead of ***return*** in a function:

In [None]:
def squares(n=10):
    print(f"Generating squares from 1 to {n ** 2}")
    for i in range(1, n + 1):
        yield i ** 2

In [None]:
gen = squares()
gen

In [None]:
for x in gen:
    print(x, end=" ")

**Generator expresssions**

Another way to make a generator is by using a generator expression. This is a generator analogue to list, dict, and set comprehensions. To create one,
enclose what would otherwise be a list comprehension within parentheses instead of brackets:

In [None]:
gen = (x ** 2 for x in range(100))
gen

In [None]:
sum(x ** 2 for x in range(3))
#dict((i, i ** 2) for i in range(5))

5

**itertools module**

The standard library itertools module has a collection of generators for many common data algorithms. For example, groupby takes any sequence and a function, grouping consecutive elements in the sequence by
return value of the function. Here’s an example:

In [None]:
import itertools
def first_letter(x):
    return x[0]

names = ["Alan", "Adam", "Wes", "Will", "Albert", "Steven"]

for letter, names in itertools.groupby(names, first_letter):
    print(letter, list(names)) # names is a generator

**Errors and Exception Handling**

Many functions only work on certain kinds of input. As an example, Python’s ***float*** function is capable of casting a string to a floating-point number, but fails with ***ValueError*** on improper inputs, for example:

In [None]:
float("1.2345")
#float("something")

Suppose we wanted a version of float that fails gracefully, returning the input argument. We can do this by writing a function that encloses the call to ***float*** in a ***try/except*** block

In [None]:
def attempt_float(x):
    try:
        return float(x)
    except:
        return x

In [None]:
attempt_float("1.2345")
attempt_float("something")

In [None]:
float((1, 2))

In [None]:
def attempt_float(x):
    try:
        return float(x)
    except ValueError:
        return x

In [None]:
attempt_float((1, 2))

In [None]:
def attempt_float(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return x

In [None]:
attempt_float((1, 2))

# **Files and the Operating System**

To open a file for reading or writing, use the built-in open function with either a relative or absolute file path.

In [None]:
from google.colab import drive
drive.mount('/content/drive',force_remount=True)
path = "/content/drive/My Drive/ColabNotebooks/first.txt"
f = open(path, encoding="utf-8")

Mounted at /content/drive


***iNote*** : By default, the file is opened in read-only mode 'r'. We can then treat the file handle f like a list and iterate over the lines.
The lines come out of the file with the end-of-line (EOL) markers intact.

In [None]:
lines = [x.rstrip() for x in f]  ## open(path, encoding="utf-8")
lines

['1 2 3 4 5',
 'Rajesh Gopakumar',
 'This is a test file.',
 '1 2 3 4 5 6 7 8 9 10']

***iNote*** : When you use open to create file objects, it is recommended to close the file when you are finished with it. Closing the file releases its resources back to the operating system.

In [None]:
f.close()

One of the ways to make it easier to clean up open files is to use the with statement. This will automatically close the file f when exiting the with block.
Failing to ensure that files are closed will not cause problems in many small programs or scripts, but it can be an issue in programs that need to interact with a large number of files.

In [None]:
with open(path, encoding="utf-8") as f:
    lines = [x.rstrip() for x in f]
lines

['1 2 3 4 5',
 'Rajesh Gopakumar',
 'This is a test file.',
 '1 2 3 4 5 6 7 8 9 10']

***File open in "w" write mode, creates a file if not existing.***

In [None]:
paths = "/content/drive/My Drive/ColabNotebooks/second.txt"
ff = open(paths, 'w', encoding="utf-8")
ff.write("This is a test file!")
ff.close()

In [None]:
with open(paths, encoding="utf-8") as f:
    contents = [x.rstrip() for x in f]
contents

['This is a test file!']

***File open in "r" read mode (default mode)***

In [None]:
paths = "/content/drive/My Drive/ColabNotebooks/second.txt"
file1 = open(paths, "r")
print("File content: ")
print(file1.read())
print()
file1.close()

File content: 
This is a test file!



***File open in "a" append mode***

In [None]:
paths = "/content/drive/My Drive/ColabNotebooks/second.txt"
ff = open(paths, 'a', encoding="utf-8")
ff.write(" A new line appended!")
ff.close()

In [None]:
#paths = "/content/drive/My Drive/ColabNotebooks/second.txt"
file1 = open(paths, "r")
print("File content after appending: ")
print(file1.read())
print()
file1.close()

File content after appending: 
This is a test file!A new line appended! A new line appended!



For readable files, some of the most commonly used methods are ***read***, ***seek***, and ***tell***.

***read*** returns a certain number of characters from the file. What constitutes a “character” is determined by the file’s encoding (e.g., UTF-8) or simply raw bytes if the file is opened in binary mode.

The ***read*** method advances the file handle’s position by the number of bytes read.

In [None]:
#f1 = open(paths)
#f1.read(10)
f2 = open(paths, mode="rb")  # Binary mode
f2.read(10)

b'This is a '

***tell*** gives you the current position

In [None]:
#f1.tell()
f2.tell()

10

In [None]:
import sys
sys.getdefaultencoding()

'utf-8'

***seek*** changes the file position to the indicated byte in the file

In [None]:
f1.seek(3)
f1.read()
#f1.tell()

61

In [None]:
f1.close()
f2.close()

To write and read text to a file, you can use the file’s write or writelines or read or readlines methods.

In [None]:
path

with open("tmp.txt", mode="w") as handle:
    handle.writelines(x for x in open(path) if len(x) > 1)

with open("tmp.txt") as f:
    lines = f.readlines()

lines

['This is a test file!A new line appended! A new line appended!']

In [None]:
import os
os.remove("tmp.txt")

In [None]:
with open(path) as f:
    chars = f.read(10)

chars
len(chars)

10

In [None]:
with open(path, mode="rb") as f:
    data = f.read(10)

data

In [None]:
data.decode("utf-8")
data[:4].decode("utf-8")

In [None]:
sink_path = "sink.txt"
with open(path) as source:
    with open(sink_path, "x", encoding="iso-8859-1") as sink:
        sink.write(source.read())

with open(sink_path, encoding="iso-8859-1") as f:
    print(f.read(10))

In [None]:
os.remove(sink_path)

In [None]:
f = open(path, encoding='utf-8')
f.read(5)
f.seek(4)
f.read(1)
f.close()