# Python Packages

Today we will learn how to convert a python notebook into packages. By this time in the program, you should be familiar with essential python tools such as Numy, Pandas, Matplotlib and Sklearn. Creating our own python package(s) allows us to compartmentalize development task(s), reduce code redundancy and allow modularity within our projects. 

 For the purpose of this demonstration, we will be using one of my previous notebooks (Knapsack using GA). Understanding the algorithm is not important for this lesson but if you're interested I recommend this YouTube Video: https://www.youtube.com/watch?v=MacVqujSXWE

This notebook contains 3 short portions
* Data Loading 
* Algorithm Implantation 
* Testing & Results

The Data Loading and Algorithm Implantation portions will be converted as modules (data.py & GA.py). The Testing & Results portion will import the algorithm only to produce the output. 


### Text Editor

Being familiar with the Jupyter Environments is great but we encourage everyone to get familiar with at least one IDE and one Text Editor. An IDE is a collection of development tools that are intended to simplify the process of coding IDEs. Two IDEs I recommend for Python development are PyCharm and VS Code. If you would like more information on IDEs Professor Whyte has covered them extensively during his Office Hours. In this session I would like to introduce the idea of text editors. They are light-weight pieces of software that contain built-in functionalities designed to ease and speed up the process of editing code when compared to  IDEs. I will be using emacs commands as part of this notebook. 

VIM (Recommended): https://realpython.com/vim-and-python-a-match-made-in-heaven/

Emacs (Personal UNIX Choice): https://wikemacs.org/wiki/Installing_Emacs_on_OS_X

### Data Loading

If you choose to reuse a function in different scrips or notebooks, copy & pasting can take-up unnecessary space. To avoid this, we can create a python package to simply import the function. For example, the get_data() function downloads a URL’s data and returns it:

In [2]:
import os 
import pandas as pd
from urllib.request import urlretrieve 

URL = 'https://raw.githubusercontent.com/Castellanos96/SIADS593/main/Items.csv'

def get_data(filename='Items.csv', url=URL, force_download=False):
    if force_download or not os.path.exists(filename):
        urlretrieve(URL, filename)
        
    data = pd.read_csv('Items.csv',index_col=0)
    assert all(data.columns == ['Item','Weight','Survival Points'])
    
    return data 


In [3]:
get_data().head()

Unnamed: 0,Item,Weight,Survival Points
0,sleeping bag,15,15
1,rope,3,7
2,pocket knife,2,10
3,flashlight,5,5
4,bottle,9,8


Ideally, we would like to create a data.py file to gather our information for a project. We can first start by making a new directory, we will call it week 3.

In [1]:
mkdir week3

We can turn a directory into a python package with the following command:

In [None]:
touch week3/__init__.py

If we launch the file, it should be empty. This file is what initializes your module. For now we can leave it blank.

In [None]:
emacs week3/__init__.py

To create our data file, we can use any text editor (strongly recommend vim). After it launches simply copy & paste the code you wish to modulate. 

In [None]:
emacs week3/data.py

After bring pasted and saved, we should be able to call the module as such:

In [4]:
from data import get_data
data = get_data()
data.head()

Unnamed: 0,Item,Weight,Survival Points
0,sleeping bag,15,15
1,rope,3,7
2,pocket knife,2,10
3,flashlight,5,5
4,bottle,9,8


### Algorithm Implementation 

We can follow the same steps as before with the genetic algorithm class. One small technical difference for this example that I will define the data inside this module. 

In [5]:
### Adding our data to feed directly into this module
from data import get_data
data = get_data()

In [None]:
emacs week3/GA.py 

### Using our Module

In [6]:
#importing our module for the Genetic Algorithm 
from GA import Genetic_Algorithm

In [14]:
#Provides you with the parameter details

#Allows you to see the source code 
#Genetic_Algorithm??

In [15]:
### Results of 1000 GA Simulations (50 Generations)
simulation_results = [] 
for x in range(1000):
    GA_50_generations = Genetic_Algorithm(print_details = False,number_of_generations=50)
    GA_50_generations.adaption()
    simulation_results.append(GA_50_generations.fitness_avg_history[len(GA_50_generations.fitness_avg_history)-1])
print("Generations = 50 , Average of fitness 1000 Simulations: ",sum(simulation_results)/len(simulation_results))

Generations = 50 , Average of fitness 1000 Simulations:  46.495850000000196


We can start the GitHub upload process from this point. 

TIP: Use the following command to add fill directories:

In [None]:
git add week3/*.py

### GitHub Documentation : https://docs.github.com/en

#### Helpful Links for Beginners 

* Installing and configuring: https://docs.github.com/en/desktop/installing-and-configuring-github-desktop/overview/getting-started-with-github-desktop


* Inviting collaborators : https://docs.github.com/en/account-and-profile/setting-up-and-managing-your-personal-account-on-github/managing-access-to-your-personal-repositories/inviting-collaborators-to-a-personal-repository


* Contributing and collaborating: https://docs.github.com/en/desktop/contributing-and-collaborating-using-github-desktop/adding-and-cloning-repositories/cloning-a-repository-from-github-to-github-desktop