# Programming and Data Analysis

> Python Tips

Kuo, Yao-Jen <yaojenkuo@ntu.edu.tw> from [DATAINPOINT](https://www.datainpoint.com/)

In [1]:
from random import randint
import csv
import json

## Comprehensions

## What are comprehension?

> Comprehensions are constructs that allow sequences to be built from other sequences. Python 2.0 introduced list comprehensions and Python 3.0 comes with dictionary and set comprehensions.

Source: <https://python-3-patterns-idioms-test.readthedocs.io/en/latest/>

## Building a list the traditional way

In [2]:
primes = [2, 3, 5, 7, 11]
squared_primes = []
for p in primes:
    squared_primes.append(p**2)
print(squared_primes)

[4, 9, 25, 49, 121]


## Building a list with list comprehension

In [3]:
primes = [2, 3, 5, 7, 11]
squared_primes = [p**2 for p in primes]
print(squared_primes)

[4, 9, 25, 49, 121]


## Building a list with list comprehension and `if` statement

In [4]:
from random import randint

random_integers = [randint(1, 100) for _ in range(20)]
odds_from_random_integers = [ri for ri in random_integers if ri % 2 == 1]
print(random_integers)
print(odds_from_random_integers)

[37, 94, 13, 18, 58, 19, 44, 59, 9, 48, 98, 60, 76, 75, 90, 11, 85, 21, 24, 33]
[37, 13, 19, 59, 9, 75, 11, 85, 21, 33]


## Building a list with list comprehension and `if-else` statement

In [5]:
random_integers = [randint(1, 100) for _ in range(20)]
is_odd_from_random_integers = [True if ri % 2 == 1 else False for ri in random_integers]
print(random_integers)
print(is_odd_from_random_integers)

[33, 42, 53, 1, 66, 42, 48, 89, 97, 82, 11, 69, 80, 44, 88, 45, 60, 1, 52, 40]
[True, False, True, True, False, False, False, True, True, False, True, True, False, False, False, True, False, True, False, False]


## Building a set with set comprehension

In [6]:
primes = {2, 3, 5, 7, 11}
squared_primes = {p**2 for p in primes}
print(squared_primes)
print(type(squared_primes))

{4, 9, 49, 121, 25}
<class 'set'>


## Building a dictionary with dictionary comprehension

In [7]:
primes = {2, 3, 5, 7, 11}
squared_primes = {p: p**2 for p in primes}
print(squared_primes)
print(type(squared_primes))

{2: 4, 3: 9, 5: 25, 7: 49, 11: 121}
<class 'dict'>


## Generators

## What is a generator in Python?

> A generator is quite like a list comprehension, the difference is that the result of a list comprehension is a collection of values, while the result of a generator is a recipe for producing values.

## Sounds pretty abstract, huh?

![](https://media.giphy.com/media/iKBYnBTbrUV6gRmwYP/giphy.gif)

Source: <https://giphy.com/>

## Replace square brackets with parentheses in the previous list comprehension example

In [8]:
primes = [2, 3, 5, 7, 11]
squared_primes = (p**2 for p in primes)
print(squared_primes)
print(type(squared_primes))

<generator object <genexpr> at 0x7fdbd2c2aac0>
<class 'generator'>


## A generator expression does not actually compute the values until they are needed

- This leads to both memory and computational efficiency
- However, a generator is single use

In [9]:
print(list(squared_primes))
print(list(squared_primes))

[4, 9, 25, 49, 121]
[]


## Iterator and Functional Functions

## The reason why we mention generators

- It is because we want to confuse you (X)
- It is because we have to deal with it quite often (O)

## Useful built-in iterator functions in Python

- `range()`
- `enumerate()`
- `zip()`

## Except for `range` the other two are all generator functions

In [10]:
help(enumerate)

Help on class enumerate in module builtins:

class enumerate(object)
 |  enumerate(iterable, start=0)
 |  
 |  Return an enumerate object.
 |  
 |    iterable
 |      an object supporting iteration
 |  
 |  The enumerate object yields pairs containing a count (from start, which
 |  defaults to zero) and a value yielded by the iterable argument.
 |  
 |  enumerate is useful for obtaining an indexed list:
 |      (0, seq[0]), (1, seq[1]), (2, seq[2]), ...
 |  
 |  Methods defined here:
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __next__(self, /)
 |      Implement next(self).
 |  
 |  __reduce__(...)
 |      Return state information for pickling.
 |  
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.



In [11]:
avenger_movies = ['The Avengers', 'Avengers: Age of Ultron', 'Avengers: Infinity War', 'Avengers: Endgame']
print(enumerate(avenger_movies))
print(list(enumerate(avenger_movies)))

<enumerate object at 0x7fdbd2ca6380>
[(0, 'The Avengers'), (1, 'Avengers: Age of Ultron'), (2, 'Avengers: Infinity War'), (3, 'Avengers: Endgame')]


In [12]:
help(zip)

Help on class zip in module builtins:

class zip(object)
 |  zip(*iterables) --> A zip object yielding tuples until an input is exhausted.
 |  
 |     >>> list(zip('abcdefg', range(3), range(4)))
 |     [('a', 0, 0), ('b', 1, 1), ('c', 2, 2)]
 |  
 |  The zip object yields n-length tuples, where n is the number of iterables
 |  passed as positional arguments to zip().  The i-th element in every tuple
 |  comes from the i-th iterable argument to zip().  This continues until the
 |  shortest argument is exhausted.
 |  
 |  Methods defined here:
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __next__(self, /)
 |      Implement next(self).
 |  
 |  __reduce__(...)
 |      Return state information for pickling.
 |  
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and ret

In [13]:
avenger_movies = ['The Avengers', 'Avengers: Age of Ultron', 'Avengers: Infinity War', 'Avengers: Endgame']
release_years = [2012, 2015, 2018, 2019]
print(zip(release_years, avenger_movies))
print(list(zip(release_years, avenger_movies)))

<zip object at 0x7fdbd2c7ee40>
[(2012, 'The Avengers'), (2015, 'Avengers: Age of Ultron'), (2018, 'Avengers: Infinity War'), (2019, 'Avengers: Endgame')]


## Useful functional functions

- `map()`
- `filter()`

## What are functional functions?

> Functional functions are functions constructed by applying and composing functions. One of the characteristics of functional functions are functions can be passed as arguments.

Source: <https://en.wikipedia.org/wiki/Functional_programming>

In [14]:
help(map)

Help on class map in module builtins:

class map(object)
 |  map(func, *iterables) --> map object
 |  
 |  Make an iterator that computes the function using arguments from
 |  each of the iterables.  Stops when the shortest iterable is exhausted.
 |  
 |  Methods defined here:
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __next__(self, /)
 |      Implement next(self).
 |  
 |  __reduce__(...)
 |      Return state information for pickling.
 |  
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.



In [15]:
print(map(float, range(10)))
print(list(map(float, range(10))))

<map object at 0x7fdbd2ca36d0>
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]


In [16]:
help(filter)

Help on class filter in module builtins:

class filter(object)
 |  filter(function or None, iterable) --> filter object
 |  
 |  Return an iterator yielding those items of iterable for which function(item)
 |  is true. If function is None, return the items that are true.
 |  
 |  Methods defined here:
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __next__(self, /)
 |      Implement next(self).
 |  
 |  __reduce__(...)
 |      Return state information for pickling.
 |  
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.



In [17]:
random_bools = [bool(randint(0, 1)) for _ in range(20)]
print(random_bools)
print(filter(None, random_bools))
print(list(filter(None, random_bools)))

[False, True, False, True, False, True, True, False, True, True, False, True, False, True, True, True, True, False, True, False]
<filter object at 0x7fdbd2cadc70>
[True, True, True, True, True, True, True, True, True, True, True, True]


## Besides built-in functions, it is A LOT more common to define our own functions

In [18]:
def squared(x):
    return x**2
def larger_than_ten(x):
    return x>=10
print(list(map(squared, primes)))
print(list(filter(larger_than_ten, primes)))

[4, 9, 25, 49, 121]
[11]


## It is more convenient to define a disposable function with lambda expression with `map` and `filter`

- A lambda expression is like an anonymous function
- We can define a lambda expression, use it, then ditch it all in the same line

In [19]:
primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
print(list(map(lambda x: x**2, primes)))        # so we don't waste a function name for an easy operation
print(list(filter(lambda x: x >= 10, primes)))  # so we don't waste a function name for an easy operation

[4, 9, 25, 49, 121, 169, 289, 361, 529, 841]
[11, 13, 17, 19, 23, 29]


## Importing Files

## Dealing with text files with extensions

- `.txt`
- `.csv`
- `.json`

## Open file with built-in function `open(file_path, "r")`

```python
file = open(file_path, "r")
# ...
# ...
file.close()
```

## Using `with` as a context manager

```python
with open(file_path, "r") as file:
    # ...
    # ...
```

In [20]:
with open("data/the_shawshank_redemption_summaries.txt", "r") as file:
    the_shawshank_redemption_summaries = file.readlines()
for e in the_shawshank_redemption_summaries:
    print(e)

Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.

Chronicles the experiences of a formerly successful banker as a prisoner in the gloomy jailhouse of Shawshank after being found guilty of a crime he did not commit. The film portrays the man's unique way of dealing with his new, torturous life; along the way he befriends a number of fellow prisoners, most notably a wise long-term inmate named Red.

After the murder of his wife, hotshot banker Andrew Dufresne is sent to Shawshank Prison, where the usual unpleasantness occurs. Over the years, he retains hope and eventually gains the respect of his fellow inmates, especially longtime convict "Red" Redding, a black marketeer, and becomes influential within the prison. Eventually, Andrew achieves his ends on his own terms.

Andy Dufresne is sent to Shawshank Prison for the murder of his wife and her secret lover. He is very isolated and lonely at first, but realizes there 

In [21]:
with open("data/imdb_top_rated_movies.csv", "r") as file:
    csv_dict_reader = csv.DictReader(file)
    for row in csv_dict_reader:
        print(row)

{'rank': '1', 'title': 'The Shawshank Redemption', 'year': '1994', 'rating': '9.2'}
{'rank': '2', 'title': 'The Godfather', 'year': '1972', 'rating': '9.1'}
{'rank': '3', 'title': 'The Godfather: Part II', 'year': '1974', 'rating': '9.0'}
{'rank': '4', 'title': 'The Dark Knight', 'year': '2008', 'rating': '9.0'}
{'rank': '5', 'title': '12 Angry Men', 'year': '1957', 'rating': '8.9'}
{'rank': '6', 'title': "Schindler's List", 'year': '1993', 'rating': '8.9'}
{'rank': '7', 'title': 'The Lord of the Rings: The Return of the King', 'year': '2003', 'rating': '8.9'}
{'rank': '8', 'title': 'Pulp Fiction', 'year': '1994', 'rating': '8.8'}
{'rank': '9', 'title': 'The Good, the Bad and the Ugly', 'year': '1966', 'rating': '8.8'}
{'rank': '10', 'title': 'The Lord of the Rings: The Fellowship of the Ring', 'year': '2001', 'rating': '8.8'}
{'rank': '11', 'title': 'Fight Club', 'year': '1999', 'rating': '8.8'}
{'rank': '12', 'title': 'Forrest Gump', 'year': '1994', 'rating': '8.7'}
{'rank': '13', 't

{'rank': '145', 'title': 'There Will Be Blood', 'year': '2007', 'rating': '8.2'}
{'rank': '146', 'title': 'The Treasure of the Sierra Madre', 'year': '1948', 'rating': '8.2'}
{'rank': '147', 'title': "Pan's Labyrinth", 'year': '2006', 'rating': '8.1'}
{'rank': '148', 'title': 'A Beautiful Mind', 'year': '2001', 'rating': '8.1'}
{'rank': '149', 'title': 'The Secret in Their Eyes', 'year': '2009', 'rating': '8.1'}
{'rank': '150', 'title': 'Raging Bull', 'year': '1980', 'rating': '8.1'}
{'rank': '151', 'title': 'My Neighbor Totoro', 'year': '1988', 'rating': '8.1'}
{'rank': '152', 'title': 'Chinatown', 'year': '1974', 'rating': '8.1'}
{'rank': '153', 'title': 'Lock, Stock and Two Smoking Barrels', 'year': '1998', 'rating': '8.1'}
{'rank': '154', 'title': 'The Gold Rush', 'year': '1925', 'rating': '8.1'}
{'rank': '155', 'title': 'Shutter Island', 'year': '2010', 'rating': '8.1'}
{'rank': '156', 'title': 'No Country for Old Men', 'year': '2007', 'rating': '8.1'}
{'rank': '157', 'title': 'Di

In [22]:
with open("data/imdb_top_rated_movies.json", "r") as file:
    list_of_dicts = json.load(file)
for row in list_of_dicts:
    print(row)

{'rank': 1, 'title': 'The Shawshank Redemption', 'year': 1994, 'rating': 9.2}
{'rank': 2, 'title': 'The Godfather', 'year': 1972, 'rating': 9.1}
{'rank': 3, 'title': 'The Godfather: Part II', 'year': 1974, 'rating': 9.0}
{'rank': 4, 'title': 'The Dark Knight', 'year': 2008, 'rating': 9.0}
{'rank': 5, 'title': '12 Angry Men', 'year': 1957, 'rating': 8.9}
{'rank': 6, 'title': "Schindler's List", 'year': 1993, 'rating': 8.9}
{'rank': 7, 'title': 'The Lord of the Rings: The Return of the King', 'year': 2003, 'rating': 8.9}
{'rank': 8, 'title': 'Pulp Fiction', 'year': 1994, 'rating': 8.8}
{'rank': 9, 'title': 'The Good, the Bad and the Ugly', 'year': 1966, 'rating': 8.8}
{'rank': 10, 'title': 'The Lord of the Rings: The Fellowship of the Ring', 'year': 2001, 'rating': 8.8}
{'rank': 11, 'title': 'Fight Club', 'year': 1999, 'rating': 8.8}
{'rank': 12, 'title': 'Forrest Gump', 'year': 1994, 'rating': 8.7}
{'rank': 13, 'title': 'Inception', 'year': 2010, 'rating': 8.7}
{'rank': 14, 'title': 'Th

## Environment and Module Management

## What is environment management?

Python applications will often use packages and modules that do not come as part of the standard library. Applications will sometimes need a specific version of a library. This means it may not be possible for one Python installation to meet the requirements of every application.

## The solution

To create a virtual environment, a self-contained directory tree that contains a Python installation for a particular version of Python, plus a number of additional packages.

## What is module management?

A library quickly installs, runs, updates, and even uninstalls modules and their dependencies.

## The standard environment and module management tools

- Environment manager: `venv`：<https://docs.python.org/3/tutorial/venv.html>
- Module manager: `pip`：<https://pip.pypa.io/en/stable>

## The third party environment and module management tool

- `Conda` as a manager for both environments and modules.

## What is Conda?

Conda is an open source package management system and environment management system that runs on Windows, macOS and Linux. Conda quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments on your local computer. It was created for Python programs, but it can package and distribute software for any language.

## Easiest way to acquire conda

- Conda comes with [Miniconda](https://docs.conda.io/en/latest/miniconda.html)
- Why choosing Miniconda over Anaconda? Lean.

## Where to run conda?

- Windows users: Use Anaconda Prompt or Anaconda PowerShell comes with Miniconda installation.
- macOS users: Use Terminal after Miniconda is installed.

## Command to validate conda installation and version

```bash
# run in command line, do not run in jupyter notebooks.
(base) conda --version
```

## Command to install a specific module

```bash
# run in command line, do not run in jupyter notebooks.
(base) conda install jupyter # or run pip install jupyter
```

## Command to validate available environments

```bash
# run in command line, do not run in jupyter notebooks.
(base) conda env list
```

## Command to build a new and isolated environment

We can validate available environments after the environment is built.

```bash
# run in command line, do not run in jupyter notebooks.
(base) conda create --name prgda python=3.9.7
```

## Command to activate an environment

```bash
# run in command line, do not run in jupyter notebooks.
(base) conda activate prgda
```

## Command to install some third-party modules in current environment

```bash
# run in command line, do not run in jupyter notebooks.
(prgda) conda install ipykernel # or run pip install ipykernel
```

## Command to validate kernels of Jupyter Notebook

```bash
# run in command line, do not run in jupyter notebooks.
(prgda) jupyter kernelspec list
```

## Command to add new kernels for Jupyter Notebook

```bash
# run in command line, do not run in jupyter notebooks.
(prgda) python -m ipykernel install --user --name prgda --display-name "Programming and Data Analysis"
```

## Command to remove kernels of Jupyter Notebook

```bash
# run in command line, do not run in jupyter notebooks.
(prgda) jupyter kernelspec remove prgda
```

## Command to deactivate current environment and get back to base

```bash
# run in command line, do not run in jupyter notebooks.
(prgda) conda deactivate
```

## Command to remove an environment

```bash
# run in command line, do not run in jupyter notebooks.
conda remove --name prgda --all
```

## How to duplicate the environment of assignments?

- Download the `environment.yml` file in assignment to your working directory.
- Command to create an environment based on it.

```bash
# run in command line, do not run in jupyter notebooks.
conda env create --file environment.yml
```