# Project Structure

Having a good structure in your software is what finally will make a huge difference. As your software starts growing, you need to easily identify where each of your files should be placed in order to:

- Know where the helper functions are located
- Make the correct import `statements` and not having to rely on a map to know where your modules are
- Know where to run unit and integrations tests when you are working in a CI/CD environment
- Allow your team have a common framework convention
- Publish your code, since the publishing procedure requires to meet some organization standards
- Make your code more user-friendly
- Create well documented packages

And probably my favourite one:

- It feels good!

If you are in Colab and don't have the examples with you, run the following code to download the folder with the examples. Remember that you can access `.py` files in Colab and modify them!

In [None]:
!wget "https://aicore-files.s3.amazonaws.com/Foundations/Software_Engineering/structure_example.zip"
import zipfile
with zipfile.ZipFile("structure_example.zip", 'r') as zip_ref:
    zip_ref.extractall("example")

> ## Code structure is crucial for designing, developing, and maintaining a program.

Despite being so important, Python is very flexible when it comes to structuring your program.


This is something good, since we can use a different structure according to the purpose of our program. However, this also creates all sort of debate, since everyone has a different perspective on how to structure the code.

### Which structure should I use?

Thus, this notebook will provide some examples of structures that you might follow. The structures you will see are the bare minimum your program should comply

# Different Structures for Different Purposes

Depending on your end goal, you will have a different project skeleton. Fear not for I am here to tell you that they are built very similarly, and once you understand them, you will know when to use each with ease.

<sub>_Disclaimer: These structures are intended to be used as CLI applications. For web apps that rely on Django or Flask, the structure is similar, but much more content should be added_</sub>

## Single Script

Projects do not necessarily have tons of classes. If your project doesn't have many dependencies you can go with a simple structure that has everything in a single directory:
```
root/
│
├── project.py
└── test.py
```

However, this will not allow its distribution. It is difficult to send a script and expect that everyone have the same libraries you used during your project.

We should add something that makes this `'publishable'`:
```
root/
│
├── project.py
├── test.py
├── requirements.txt
└── setup.py
```



This looks better, but there are still missing things. A small documentation file is usually needed. Let's say you want to publish it on `GitHub`. Then, you can simply add a README.md file, and add a LICENSE file so you can add Permissions, Limitations, and Conditions to your code. Remember that you can also use `.gitignore` if you want to keep files untracked, and therefore, not published.

```
root/
│
├── project.py
├── test.py
├── requirements.txt
├── setup.py
├── setup.cfg
├── README.md
├── LICENSE
└── .gitignore

Let's go file by file:

- `root`: This is the directory of your project. In this layout, everything will be contained in a single directory
- `project.py`: The script that you are distributing. When you distribute it, avoid using names like `main.py`. <sub><sup>That is fine if you are still in the development process</sup></sub>
- `test.py`: As the name suggests, this script contains the tests that check your code works as intended.
- `requirements.txt`: This file has all the external dependencies and their versions.
- `setup.py`: It describes the metadata of your project as well as the packages that need to be installed to run your script. As opposed to `requirements.txt`, `setup.py` indicates that your project has likely been packaged and distributed with Distutils, which is the standard for distributing Python modules.
    - For the first exercises we will stick to `requirements.txt`, but when publishing a package, include `setup.py`, and `setup.cfg` ([link](https://setuptools.readthedocs.io/en/latest/userguide/declarative_config.html)), or `pyproject.toml` ([link](https://setuptools.readthedocs.io/en/latest/userguide/declarative_config.html)). Packages are slowly moving to use `pipenv` ([link](https://pipenv.pypa.io/en/latest/)) to create all dependencies in a clean and concise manner. More examples in this [link](https://packaging.python.org/tutorials/packaging-projects/#packaging-python-projects)
- `README.md`: This file documents the purpose, usage, and (usually) the content of your project. Usually repos don't have enough information, check this [page](https://www.makeareadme.com) to see how to create a good README file.<br>_Small tip: get familiar with markdown or reStructuredText files. They allow you to create professional README files!_
- `LICENSE`: This text describes the license that you are using for the project. If you are distributing a project, it is a good idea to add one. `GitHub` has shortcuts to add them to your repo, but if you are unsure of which one you should use, check this [page](https://ufal.github.io/public-license-selector/) or this [one](https://choosealicense.com)
- `.gitgnore`: If you don't want to keep track of certain files, for example those that are note relevant for other users such as `__pycache__` or `.DS_Store`, or your personal virtual environment if you created it in your repo, you can add these in `.gitignore` as plain text.

## MiniChallenge 1:

Create a GitHub repo using the files in the `examples` folder

The repo should contain:
- A `.gitignore` file
- A `LICENSE` file
- A `README.md` file
- `celebrities_births.py`
- `requirements.txt`
- `test_celebrities_birth.py`

Most of the files are easily created, so you only need to create the `requirements.txt` and the `test_celebrities_birth.py`

Regarding the tests, they don't have to be very specific. You can, for example, test if a date is valid. Include at least 3 tests

## Single Package

During your projects, you are very unlikely to have a single script (and if you do, reconsider refactoring your code!). As such, your main script will depend on modules or packages. 

In this example, all your functions and classes have been moved to different modules, so instead of placing all of them in the main directory, we will create a subdirectory.

So, __DON`T__ do this:
```
root/
│
├── project.py
├── project_module_1.py
├── project_module_2.py
├── test.py
├── requirements.txt
├── README.md
├── LICENSE
└── .gitignore
```

__DO__ this:

```
root/
│
├── project
│   ├── __init__.py
│   ├── project.py
│   ├── module_1.py
│   └── module_2.py
│
├── test.py
├── requirements.txt
├── README.md
├── LICENSE
└── .gitignore
```

Now that you have more functions and classes, it would be a good idea to separate your tests as well. In this case, I included one integration test file, and one unit test file:

```
root/
│
├── project
│   ├── __init__.py
│   ├── __main__.py
│   ├── module_1.py
│   └── module_2.py
│
├── test
│   ├── __init__.py
│   ├── test_project.py
│   ├── test_module_1.py
│   └── test_module_2.py
│   
├── requirements.txt
├── README.md
├── LICENSE
└── .gitignore
```

_In some cases you might find the test folder inside the project folder. This is up to developer._

Let's review this one more time:

- `project/`: The folder containing the main script. 
- `project/__init__.py`: This is what tells Python that your folder is not simply a directory, but a __package__.
- `project/__main__.py`: This one is NOT necessary, but we will add it so that we can play around with the package. If we don't include it, the package couldn't be ran as a module, and we would need to add a new file to call for the package. On the other hand, if we add `__main.py__`, we can directly run the package in the CLI typing `python -m project`
- `project/project_module.py`: Much of the logic in your main script is now rearranged into smaller pieces of code. Remember the rules of abstraction and encapsulation to know how to separate them
- `tests/__init__.py`: This `__init__.py` file is just there for the sake of convenience, that way you can just run all the tests from the root directory. However, when you publish your package, you can get rid of it. We will see why it can be convenient to remove it.
- `tests/test_*.py`: The same way the main module of the project has been divided into modules, tests can be also divided to give a greater level of granularity. Add one test per module, and another test for integration testing



## MiniChallenge 2

From the previous challenge, recreate the above structure. For your convenience, copy the `__main.py` file in the examples folder, and rename it as `__main__.py`

# Publishing your Package

Now it is time to let the world your skills. You will publish the project we have just created. We only need to add two files, and go through a series of simple steps. The new structure will look like this:

```
root/
│
├── project
│   ├── __init__.py
│   ├── example.py
│   ├── module_1.py
│   └── module_2.py
│
├── test
│   ├── test_project.py
│   ├── test_module_1.py
│   └── test_module_2.py
│   
├── README.md
├── setup.py
├── LICENSE
└── .gitignore
```

Can you see the differences? It is actually quite easy to reach this step if you had the previous structure!

Let's review it:

First of all, we added the following files:

- `setup.py`: This indicates that our package(s) is likely to be distributed using setuptools or Distutils. In presence of this file, using `pip install .` will look for the packages in the directory and will add them to your Python Path the same way you install any other library.<br>To use it, you have to import the `setuptools` library, and from the library, import `setup`. Additionally, you can also import `find_packages` which will look for the packages in your directory, and add the modules in said packages to a list that will be installed when called from pip. The file should look like this:

<p align=center><img src=images/setup.png width=700></p>

- `project/example.py`: In this case, we have exactly the same as in the previous `__main__.py`. The reason we are changing the name is because we don't want to create an undesired behaviour of users installing our package.
Additionally, notice that we remove the following files:
- `test/__init__.py`: If included, the `find_packages` method from `setup.py` will identify the `tests` folder as a package, and will also install the tests on the distributed package.
- `requirements.txt`: We already have `setup.py`, so this is unnecessary


Most likely, you will find packages that include a file called `setup.cfg`. Their function is the same as the `setup.py` but it indicates the configuration of the package in a declarative way, like in a yaml file. You can see an example [here](https://setuptools.pypa.io/en/latest/userguide/declarative_config.html)

# Final Challenge: Publish your package to pypi

We are almost done!

To publish the package, you only need three small things:

1. Create a Pypi account 
2. Create the distribution files 
3. Install `twine` to publish the distribution files

Let's review each step


## Create the Pypi account

Creating an account in Pypi is fairly simple. Simply go to the [Pypi](https://pypi.org/) website and click on 'Register'

<p align=center><img src=images/pypi.png width=600></p>

And then simply fill in the data

## Create the Distribution Files

Packages on Pypi are not distributed as plain source code. Instead, they are wrapped into distribution files. Thus, we need to create them before publishing them.

This is where the `setup.py` file comes into play. We are going to use it to create the distribution files. To do so, run the following line in the command line

In [None]:
python setup.py sdist

Make sure that `setup.py` is in a structure that you can see above

Observe the files that have been generated now:

- A folder with an extension like this `egg-info`
- A folder named `dist`

A "Python egg" is a logical structure embodying the release of a specific version of a Python project, comprising its code, resources, and metadata.

On the other hand, `dist` contains the packages in a `tar` file. Now, we just need to publish it, and we will use `twine` to do so

## Install twine to publish the package

To upload your package to PyPI, you’ll use a tool called Twine:

In [None]:
pip install twine

And now, we simply use `twine` in the command line and the option upload with the `dist` files that we generated

In [None]:
twine upload dist/*

You will need to provide your username and password:

<p align=center><img src=images/pypi2.png width=700></p>

(Don't worry if, when typing the password, the cursor doesn't move, the password is being typed)

And there you go, your package is ready for other programmers to be used!

# Additional Structures

We saw how to organize our project when it only contained one package. But our whole project might be composed of multiple packages
```
root_folder/
│
│
├── name_project/
│   ├── __init__.py
│   ├── subpackage_1/
│   │   ├── __init__.py
│   │   ├── module_p1_1.py
│   │   └── module_p1_2.py
│   │
│   └── subpackage_2/
│       ├── __init__.py
│       ├── module_p2_1.py
│       └── module_p2_2.py
│
├── tests/
│   ├── project_1
│   │   ├── test_module_p1_1.py
│   │   └── test_module_p1_2.py
│   │
│   └── project_2/
│       ├── test_module_p2_1.py
│       └── test_module_p2_2.py
│
├── .gitignore
├── setup.py
├── setup.cfg
├── LICENSE
└── README.md
```

The structure is essentially the same, simply don't forget to add the `__init__.py` in each package and you will be all set! Many other applications or libraries will have incredibly complex structures, but remember that many of those packages require years of polishing. Don't feel overwhelmed by them 💪

# Summary

- You saw the importance of keeping everything organized
- Your files should be well documented. Next time anyone downloads this project, he/she will be able to use the help() method to check the methods and attributes of our classes in a very neat way
- Encapsulation, Abstraction, and separating concerns will help you maintaining all your projects clean
- Some files are crucial for publishing our package: LICENSE, setup, `__init__`...


# Assessments

- Create a virtual environment, and install your published package. From any directory run `python -m project.example` where project is the name you gave to your package. Anywhere you go, you can now check celebrities that were born in a certain date (or simply go to wikipedia, but this is way much cooler)
- Now that you now how to publish your package, try to add a description in the PyPi page
- Add the repo URL corresponding to the package in the `setup.py` file
- Investigate how to publish updated versions of your package
- Another thing you should consider adding is an exhaustive documentation. Check out [Sphinx](https://www.sphinx-doc.org/en/master/), which generates documentation of your project in an incredibly easy way