# Modules and Packages

- Packages are just collections of code (functions, methods, functionality) that achieves something through code

**These could be**
* csv reader package
* pretty print in the command line
* PySpark to do data processing


- Packages are hosted in a repository just like NPM does for NodeJS packages or other languages
- Popular package repositories are **PyPi** and **Conda (Anaconda.org) and the various *channels***

> To this point we have only used built-in packages. But in the real world, we use a lot of outside packages to help solve problems. To use them, we would use `pip` or `conda` to install, configure and run code using the downloaded packages


**I prefer conda / anaconda due to my prior use of it in my jobs. Pip is fine and the more used solution. Conda offers a lot of data packages which is also why I use it + these Jupyter Notebooks**

## Outside package usage example

- Install a HTTP request library called **[requests](https://requests.readthedocs.io/en/latest/user/quickstart/)** which is used to make API calls to 3rd party APIs

> Since I am using Anaconda and Jupyter Notebooks, requests is already pre-installed so I could have avoided downloading it. However, since real workloads for apps are done in IDE's and deployed as binaries to target environments, Jupyter is not used and therefore demonstrating how to download is imperative. See the linked Conda docs [here](https://docs.conda.io/en/latest/) to learn more about the conda package and environment manager

In [2]:
conda install -c conda-forge requests

Retrieving notices: ...working... done
Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 23.3.1
  latest version: 23.5.0

Please update conda by running

    $ conda update -n base -c defaults conda

Or to minimize the number of packages updated during conda update use

     conda install conda=23.5.0



## Package Plan ##

  environment location: /Users/tannerbarcelos/anaconda3

  added / updated specs:
    - requests


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2023.5.7           |     pyhd8ed1ab_0         149 KB  conda-forge
    openssl-1.1.1u             |       h53f4e23_0         1.6 MB  conda-forge
    requests-2.31.0            |     pyhd8ed1ab_0          55 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         1.8 MB

The followi

- Use the requests library to programatically request data from the github public timeline

In [25]:
import requests

# Request the endpoint
url = 'https://api.github.com/events'
public_tl = requests.get(url)

# Convert the raw response data into JSON for easier parsing
tl_json = public_tl.json()

# Let's analyze the data
# print(tl_json)

for blob in tl_json:
#     print(type(blob)) # notice the type of each entry in the json body is converted to a dict for us by using the .json() method! Nice!
    
    # get event id
    print(blob.get('id'))
    
    # get use that produced the event (using key indexing instead of .get().. we see this a lot in complex, nested structures we want to extract data from)
    print(blob['actor']['display_login'])

30108179695
bonifield
30108179664
proxy4parsing
30108179672
qkr1839
30108179658
apus116
30108179646
github-actions
30108179669
sportstvdev
30108179649
Dimiqhz
30108179627
rmayr
30108179606
thevickypedia
30108179579
github-actions
30108179611
Lunakepio
30108179616
costellobot
30108179580
skasturi
30108179607
github-actions
30108179593
TylerHendrickson
30108179597
Monica-Macharia
30108179574
github-actions
30108179565
jamsheerply
30108179542
zirklbch
30108179546
mihir-gautam
30108179544
gouniLee
30108179560
pctiope
30108179534
thiwashwe
30108179530
aws-aemilia-fra
30108179533
github-actions
30108179540
Baeg-won
30108179497
predictcrypto
30108179536
supershell2019
30108179522
jatinchowdhury18
30108179519
github-actions


**This right here is how we would download and use external packages to write code to do things beyond the core capabilties of Python out of the box and/or avoid having to do it from scratch and use a library to make out lives easier**

### Modules

- In Python, every file is considered a module. This means that each file can be imported to another and used to create a complex hierarchy / organized codebase of modules and files to separate concerns
- **Packages are a collection of modules**! (When publishing a package or deploying, you package all your modules into a single binary package)

**Let's see how to create multiple modules and use them within a root file called `main.py`**

> This has to be done using a Text Editor / IDE so within the folder of this Notebook, I will create a directory called mod_demo and within it will be multiple modules to show how to use and import them