# Python Standard Library 

Python Standard Library is a collection of modules that are available with every Python installation. It contains many useful modules that can be used to perform various tasks.

You pick and choose the modules you want to use in your program. You can use the Python Standard Library to perform tasks such as reading and writing files, working with the operating system, working with the network, and much more.

Documentation: [Python Standard Library](https://docs.python.org/3/library/index.html)

In [28]:
# https://docs.python.org/3/library/ - location

## Our first import - string module

In [29]:
import string #meaning I have access now to extra string goodies
# now I have access to everything under string namespace
# documentation - https://docs.python.org/3/library/string.html

In [30]:
string.ascii_letters

'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

In [31]:
string.punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

In [32]:
string.digits

'0123456789'

In [33]:
# you do not want to calculate dates yourself, 
# easy to make mistakes with timezones, Georgian calendar, leap years etc
from datetime import datetime # I import only datetime class from datetime module
# https://docs.python.org/3/library/datetime.html

In [34]:
now = datetime.now() #notice datetime needs another datetime
now

datetime.datetime(2024, 11, 7, 17, 14, 39, 57798)

In [35]:
time_passed = datetime.now() - now
time_passed

datetime.timedelta(microseconds=14428)

In [36]:
print(f"Time passed in seconds {time_passed.seconds}")

Time passed in seconds 0


In [37]:
# how many days have passed?
print(f"Time passed in days {time_passed.days}")

Time passed in days 0


In [38]:
# how many total seconds have passed?
print(f"Time passed in seconds {time_passed.total_seconds()}")

Time passed in seconds 0.014428


In [39]:
# we can get individual years, months, days, hours, minutes, seconds
# also date
print(f"Year {now.year}")
print(f"Month {now.month}")
print(f"Day {now.day}")
print(f"Hour {now.hour}")
print(f"Minute {now.minute}")
print(f"Second {now.second}")
print(f"Microsecond {now.microsecond}")
print(f"Date {now.date()}")
print(f"Time {now.time()}")
# day
print(f"Weekday {now.weekday()}") # 0 is Monday, 6 is Sunday

Year 2024
Month 11
Day 7
Hour 17
Minute 14
Second 39
Microsecond 57798
Date 2024-11-07
Time 17:14:39.057798
Weekday 3


## Alias for datetime module

```python

In [40]:
from datetime import datetime as dtime #you can create alias to a longer module  # dtime could be dt, mytime etc, 
really_now = dtime.now() 
really_now

datetime.datetime(2024, 11, 7, 17, 14, 39, 145640)

In [41]:
really_now.weekday() # it is a method not a property

3

In [42]:
# turns out weekday starts on Monday with 0 and Sunday is 6

In [43]:
really_now.year, really_now.month  # just use the dot notation to get the useful methods and properties

(2024, 11)

In [44]:
5+3, 5*3, 5/3, 5//3, 5%3,5**3, 5-3  # basic math included

(8, 15, 1.6666666666666667, 1, 2, 125, 2)

## Math module

Math module provides mathematical functions and constants. 


In [45]:
import math
# all math functions are here
# https://docs.python.org/3/library/math.html

In [46]:
math.ceil(1.01), math.floor(1.99)

(2, 1)

In [47]:
math.factorial(5)

120

In [48]:
math.pi

3.141592653589793

In [49]:
math.cos(math.pi)  # so cos takes radians, so 180 degrees is pi

-1.0

In [50]:
# we can use prod to multiply all elements in a list
# so let's get a multiplication of numbers from 1 to 5 - so factorial
product = math.prod(range(1,6))
print(f"Product of numbers from 1 to 5 is {product} same as {math.factorial(5)}")

Product of numbers from 1 to 5 is 120 same as 120


## Statistics and Counting things

In [51]:
import statistics # now i get access to some statistics goodies

In [52]:
statistics.mean(range(10)) # should be 4.5 since from 0 to 9

4.5

In [53]:
# we can do median meaning middle value when sorted
unsorted_data = [3,1,5,1,6,17,5,6,2,5,6,7,8,9,10]
sorted_data = sorted(unsorted_data)
print(sorted_data)
# so median is 6 since we have 15 elements
# we take the middle value which has index 7
print(f"Median is {statistics.median(sorted_data)}")
print(f"Median is {sorted_data[7]}")

[1, 1, 2, 3, 5, 5, 5, 6, 6, 6, 7, 8, 9, 10, 17]
Median is 6
Median is 6


In [54]:
# median is often times preferred over mean since it is not affected by outliers as much

In [56]:
from collections import Counter  # so we import not the whole library but just one module

In [57]:
cnt = Counter("Abracababra my magic") # so i pass my string to Counter class constructor
cnt.most_common()

[('a', 5),
 ('b', 3),
 ('r', 2),
 ('c', 2),
 (' ', 2),
 ('m', 2),
 ('A', 1),
 ('y', 1),
 ('g', 1),
 ('i', 1)]

In [58]:
sentence = "A quick brown fox jumped over a sleeping fox and that's a wrap"
word_count = Counter(sentence.split()) # we split sentence into a list of words by default whitespace
word_count.most_common()

[('fox', 2),
 ('a', 2),
 ('A', 1),
 ('quick', 1),
 ('brown', 1),
 ('jumped', 1),
 ('over', 1),
 ('sleeping', 1),
 ('and', 1),
 ("that's", 1),
 ('wrap', 1)]

In [59]:
# compare to not splitting then we get characters
char_count = Counter(sentence)
char_count.most_common()

[(' ', 12),
 ('a', 5),
 ('o', 4),
 ('e', 4),
 ('r', 3),
 ('n', 3),
 ('p', 3),
 ('u', 2),
 ('i', 2),
 ('w', 2),
 ('f', 2),
 ('x', 2),
 ('d', 2),
 ('s', 2),
 ('t', 2),
 ('A', 1),
 ('q', 1),
 ('c', 1),
 ('k', 1),
 ('b', 1),
 ('j', 1),
 ('m', 1),
 ('v', 1),
 ('l', 1),
 ('g', 1),
 ('h', 1),
 ("'", 1)]

## Random module

In [61]:
import random
# https://docs.python.org/3/library/random.html
# we used random to generate random numbers not surprisingly..

In [68]:
# numbers are not really random but pseudo-random
random.seed(2024) # pick you own starting seed guarantees same specific pseudo-random values, 9000, 42, 23, etc etc
# https://www.youtube.com/watch?v=1cUUfMeOijg
# good for testing but obviously this line not good if you need fresh values
random.random(), random.randint(1,6), random.randint(1,6), random.randint(1,6), random.randint(1,6), random.randint(1,6)

(0.47009071843107064, 6, 5, 3, 2, 6)

In [44]:
random.randint(1,6) # start AND end is both included, thus 6 is included, quite rare in programming

1

In [70]:
max_throws = 10_000
dice_throws = [random.randint(1,6) + random.randint(1,6) for _ in range(max_throws)]

In [71]:
statistics.mean(dice_throws), statistics.median(dice_throws), statistics.mode(dice_throws)  # mode is most frequently seen item
# here these values will actually be very close to each other

(6.9725, 7.0, 7)

In [73]:
# standard deviation is a measure of how spread out numbers are
print(f"Standard deviation is {statistics.stdev(dice_throws)}")

Standard deviation is 2.415269571437166


In [74]:
# let's count our throws
dice_cnt = Counter(dice_throws)
dice_cnt

Counter({7: 1666,
         6: 1400,
         8: 1367,
         5: 1136,
         9: 1082,
         4: 870,
         10: 808,
         11: 572,
         3: 559,
         2: 270,
         12: 270})

In [75]:
dice_cnt.most_common()

[(7, 1666),
 (6, 1400),
 (8, 1367),
 (5, 1136),
 (9, 1082),
 (4, 870),
 (10, 808),
 (11, 572),
 (3, 559),
 (2, 270),
 (12, 270)]

In [None]:
# there are external libraries such as matplotlib, seaborn, plotly, bokeh, etc that could be used
# to plot the data
# we will cover external libraries in the near future

In [76]:
my_alphabet = list(string.ascii_lowercase)
my_alphabet

['a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'q',
 'r',
 's',
 't',
 'u',
 'v',
 'w',
 'x',
 'y',
 'z']

### Shuffling items in a list

In [82]:
# you do not want to write your own shuffle algorithm 
# it is easy to make mistakes
print(f"Original alphabet {my_alphabet}")
random.shuffle(my_alphabet) # we shuffle all 26 items IN PLACE
# this means that my_alphabet will be shuffled
# this will be in place meaning my_alphabet will get shuffled
print(f"Shuffled alphabet {my_alphabet}")

Original alphabet ['l', 'x', 'v', 'h', 'w', 'i', 'q', 'a', 't', 'n', 'z', 'p', 'k', 'u', 'e', 'r', 'y', 'c', 'b', 'j', 'd', 'o', 'f', 's', 'm', 'g']
Shuffled alphabet ['l', 'z', 't', 'x', 'e', 'c', 'h', 'u', 'i', 'j', 'w', 's', 'a', 'r', 'y', 'q', 'v', 'k', 'f', 'n', 'g', 'o', 'b', 'm', 'p', 'd']


In [62]:
random.choice(my_alphabet)

'h'

In [67]:
random.choices(my_alphabet, k=3)

['l', 'e', 'z']

In [85]:
original_alphabet = list(string.ascii_letters)# we make a copy of the original alphabet
new_alphabet_with_repeats = random.choices(my_alphabet, k=len(my_alphabet)) 
# returns a new shuffled list with unique picks keeping old one untouched
my_alphabet[:10], new_alphabet_with_repeats[:10]

(['l', 'z', 't', 'x', 'e', 'c', 'h', 'u', 'i', 'j'],
 ['u', 'i', 's', 'c', 'p', 'y', 'a', 'e', 'y', 'u'])

In [87]:
# if you do not want repeats you can use sample
# naturally if you try to sample more than you have you will get an error actually
new_alphabet_no_repeats = random.sample(my_alphabet, k=len(my_alphabet))
new_alphabet_no_repeats

['j',
 't',
 'x',
 'f',
 'n',
 'l',
 'o',
 'i',
 's',
 'a',
 'w',
 'g',
 'e',
 'r',
 'b',
 'm',
 'p',
 'q',
 'z',
 'd',
 'h',
 'u',
 'c',
 'k',
 'y',
 'v']

## System modules

System modules deal with the operating system and file system.

In [88]:
import os

In [89]:
os.listdir() # list all the files/directories in your current directory # in this case this is my Google colab directory

['alice_queen.txt',
 'Jupyter Tips.ipynb',
 'myAprilMod.py',
 'myAprilPackage',
 'MyMod.ipynb',
 'Practice_1.ipynb',
 'Python Classes.ipynb',
 'Python Dictionaries.ipynb',
 'Python File IO.ipynb',
 'Python File Operations 2 Binary Files and Pickle.ipynb',
 'Python Flow Control.ipynb',
 'Python Flow Control.md',
 'Python Functions.ipynb',
 'Python Functions.md',
 'Python Lists.ipynb',
 'Python Reading Writing Files.md',
 'Python Sets.ipynb',
 'Python Standard Library.ipynb',
 'Python Strings.ipynb',
 'Python Variables and Data Types.ipynb',
 'Python_Classes.ipynb',
 'Python_Dictionaries.ipynb',
 'Python_File_IO.ipynb',
 'Python_Flow_Control.ipynb',
 'Python_Functions.ipynb',
 'Python_Lists_and_Tuples.ipynb',
 'Python_Loops.ipynb',
 'Python_Modules_Packages_External_Libraries.ipynb',
 'Python_Sets.ipynb',
 'Python_Standard_Library.ipynb',
 'Python_Strings.ipynb',
 'Python_Variables_and_Data_Types.ipynb',
 'RunOtherNotebooks.ipynb',
 'somefile.txt',
 'Tuples.ipynb',
 'tuples.py',
 '__pyca

In [90]:
# sometimes it is important to know where you are in the file system
os.getcwd() # get current working directory
# should be different to you

'd:\\Github\\RTU_Python_720_Fall_2020\\core'

In [None]:
# for files there is newer Pathlib library which is recommended

In [91]:
from pathlib import Path

In [93]:
csv_files = [f for f in Path("sample_data").glob("*.csv") if f.is_file()]  # we look in specific subdirectory
csv_files # if not using colab you will need to change the path

[WindowsPath('sample_data/test.csv')]

In [95]:
all_txt_files = [f for f in Path("./").rglob("*.txt") if f.is_file()] # we go through all subdirectories
print(f"How many txt files {len(all_txt_files)}")
all_txt_files

How many txt files 2


[WindowsPath('alice_queen.txt'), WindowsPath('somefile.txt')]

## CSV library

Personally I prefer the external library pandas for working with CSV files, but the csv module is also available in the Python Standard Library.

In [79]:
import csv
# usually external pandas is preferred, but for smaller tasks csv library might be enough
# https://docs.python.org/3/library/csv.html

In [80]:
all_csv_files[0]

PosixPath('sample_data/california_housing_test.csv')

In [81]:
with open(all_csv_files[0], newline='\n') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',', quotechar='"')
    my_data = list(spamreader) # spamreader is a stream of data so we might need to commit it to memory

In [82]:
len(my_data)

3001

In [83]:
my_data[:3] # so we have our data we would need to convert the strings to numbers
# but again pandas library would be better suited for more advanced analysis

[['longitude',
  'latitude',
  'housing_median_age',
  'total_rooms',
  'total_bedrooms',
  'population',
  'households',
  'median_income',
  'median_house_value'],
 ['-122.050000',
  '37.370000',
  '27.000000',
  '3885.000000',
  '661.000000',
  '1537.000000',
  '606.000000',
  '6.608500',
  '344700.000000'],
 ['-118.300000',
  '34.260000',
  '43.000000',
  '1510.000000',
  '310.000000',
  '809.000000',
  '277.000000',
  '3.599000',
  '176500.000000']]

## Shutil for file operations

We can shutil module to perform file operations such as copying, moving, and deleting files.

We can also make zip files using the shutil module.

In [85]:
import shutil

shutil.make_archive("my_zip_archive", "zip", "sample_data")

'/content/my_zip_archive.zip'

In [None]:
# there is also zipfile standard library for more operations with zip files

In [None]:
# there are many more standard library modules - all found  at https://docs.python.org/3/library/

## Itertools module - for efficient looping

Itertools module provides a set of functions for working with iterators. It provides many functions that can be used to create and work with iterators.

In [97]:
# let's start with product
# we want to have a Cartesian product of two lists
# we have a list of people
people = ["Alice", "Bob", "Charlie", "David"]
# and we have a list of drinks
drinks = ["Coke", "Beer", "Water"]
prices = [0.99, 4.99]
# we could use two nested loops to get all combinations

# instead it is better to use product
from itertools import product
all_combinations = list(product(people, drinks, prices))
print("All combinations")
all_combinations

All combinations


[('Alice', 'Coke', 0.99),
 ('Alice', 'Coke', 4.99),
 ('Alice', 'Beer', 0.99),
 ('Alice', 'Beer', 4.99),
 ('Alice', 'Water', 0.99),
 ('Alice', 'Water', 4.99),
 ('Bob', 'Coke', 0.99),
 ('Bob', 'Coke', 4.99),
 ('Bob', 'Beer', 0.99),
 ('Bob', 'Beer', 4.99),
 ('Bob', 'Water', 0.99),
 ('Bob', 'Water', 4.99),
 ('Charlie', 'Coke', 0.99),
 ('Charlie', 'Coke', 4.99),
 ('Charlie', 'Beer', 0.99),
 ('Charlie', 'Beer', 4.99),
 ('Charlie', 'Water', 0.99),
 ('Charlie', 'Water', 4.99),
 ('David', 'Coke', 0.99),
 ('David', 'Coke', 4.99),
 ('David', 'Beer', 0.99),
 ('David', 'Beer', 4.99),
 ('David', 'Water', 0.99),
 ('David', 'Water', 4.99)]