# GENREAL SET UP

I am writing a program that will scrape the web for articles related to Biblical archeology, predict whether or not each article is relevant, and link relevant article to specific passages in the Bible. 

In order to get started, I will set up a number of components. 
1. I will set up a dedicated conda environment in order to manage dependancies for this project. I will need to install many specialized packages for this project that I will not necessarily need for other projects and each of these packages comes with it's own set of dependencies. Setting up a dedicated conda environment will allow me to satisfy all of the dependencies without intefering with dependency requirements that are called for when executing a different project. Setting up a dedicated conda enviroment is like creating a playground for this project. I can do anything I want inside that environment without concern that I will mess up anything outside of it. The worst that could happen is that I will mess up this environment so badly that I have to delete it and start over. To deal with that possibilty, once I set up the dependencies in a way that works, I will generate an explicit specification file to build an identical conda environment, if needed.
2. I will create a bat file that will automatically launch this project in the correct conda environment.
3. I have also created the bones of a localized Python package. As I create functions that I will call regularly, I will likely include them as modules in this package for easy retrieval and use. 
4. I will create a set of regularly used commands and store them as JSON code that can be accessed through nbextentions "Snittet Menu."
5. I will setting up SQL database infrustructure to handle the data I will generate.

# CONDA ENVIRONMENT

I find this site to be invaluable: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#cloning-an-environment. It is my go to source for answering any questions I have about conda environments. 

For now, I am going to start simple. I will create a conda environment call *link* which is built with the latest version of Python. To to do this I follow these steps:

1. Open a command prompt
2. Type this command into the command prompt: conda create -n link python=3.10
3. Activate *link* with this code: activate link
4. I like to look at what's in the environment before starting so I type this code into the command prompt: conda list
5. Any project is going to require pandas, numpy, matplotlib and of course Jupyter Notebook. I will install them at this point. I know I will use other packages for web scraping, natural language processing and predictive modeling, but I want to explore my option a little more before deciding which of these to use.
6. Install pandas with this code: conda install pandas. During the install process, numpy was also required, so it was installed as well. Two for one!
7. Install matplotlib with this code: conda install matplotlib
8. Install JupyterLab with this code: conda install -c conda-forge jupyterlab
9. I like using several of the nbextentions such as Snippet Menus, Move selected cells, Scratchpad, Spell-Check Markdown, and Table of Contents. As such, I'll also install nbextentions using this code: conda install -c conda-forge jupyter_contrib_nbextensions

After launching Jupyter Notebook in this environment, the home screen will have a tab for Nbextentions. I will select my desired extentions from this tab.

# LAUNCH FILE

In both my day job and my personal life, I am constantly switching between projects that use Jupyter Notebook. Each time, I execute these four steps:

1. Open a command prompt
2. Activate the desired conda environment
3. Change the current work directory to my project folder
4. Launch Jupyter Notebook

While this process is fairly simple, it is the slow drip of water torture to repeat these steps over and over again. It also takes time and is distracting in meetings if I ever have to quickly pull up a notebook for discussion. In these situations, there may be several sets of Team eyeballs watching my every move while waiting for the sweet release of a Jupyter Notebook to majically appear. The point is, while I love working in Jupyter Notebook, these four steps are a point of frustration. 

As such, my mission in this post is to create a "launch button." This launch button should automatically execute these four steps when clicked. In addition, I would like this launch button to be dynamic so that it can be copied into any project folder with little to no modification. 

WHY USE A BATCH FILE?????




For this project I will be openning a command prompt, activating the *link* conda environment, setting the directory to my project folder, and launching Juptyer Notebook. This is not difficult to do but it takes several steps and a minute or two every time. There is also the possibility of forgetting and not being in the correct project folder or working from the wrong conda environment. As such, I will create a launch file that will automatically launch Jupyter Notebook in the project directory using the dedicated conda environment. 

Here are the steps for setting up this lauch file:
1. Open a txt file.
2. I wrote this line to turn off echo: @ ECHO OFF https://www3.rocketsoftware.com/rocketd3/support/documentation/mvb/32/refman/fileacct/echo-on_echo-off_command.htm
3. I openned up a command prompt and activated my dedicated conda environment with this command: activate link
4. Once link was activated, I typed this line into the command prompt: conda info
5. I located **base environment**:
![image.png](attachment:image.png)
6. Using the file location in the **base environment** line, I wrote the following line in the txt file following @ ECHO OFF: call C:\Users\david\anaconda3\ **Scripts\activate**.bat
7. This line opens the command prompt
8. Next, I write this line to set my project directory: cd C:\Users\david\Projects\Linking Biblical Archeology\Jupyter\Jupyter notebooks 
9. Then I write this line to activate the *link* conda environment: call activate link
10. Finally, I write this line to launch Jupyter Notebook: call jupyter notebook
11. That's it. I saved this file as launch.bat and it was ready to use

This is what the entire file looks like:
![image-2.png](attachment:image-2.png)

There you have it! I can double-click this launch "button" and it will automatically start me in the correct work folder with the correct conda environment and launch Jupyter notebook. For other projects, I'll can modify the project directory and conda environment, but otherwise this code is completely reusable.

# SQL PLATFORM

The first thing I need to do is decide which SQL platform to use. This article was helpful: https://towardsdatascience.com/databases-101-how-to-choose-a-python-database-library-cf19d1157d45

I have decided to go with PostgreSQL. It appears to have more to offer than both SQLite and MySQL, although it also appears to be more difficult to set up. I could probably get away with using SQLite for this project since I'm not concerned about security and eveything I do will be done on my laptop. However, I decided to go with the more complex PostgreSQL because it allows flexibility for future projects.

I downloaded PostgreSQL from here: https://www.postgresql.org/download/

# DEPENCY SET UP

In [1]:
import os
import pandas as pd
import numpy as np
import sqlite3
from sqlite3 import Error

# Set project folder as directory
os.chdir(r'C:/Users/david/Projects/Linking Biblical Archeology/Jupyter')

# Remove row and column limits
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)