# Project.ipynb

This notebook contains documentation to run everything in this project.

## Prerequisites

In the following subsections, it is expected that the user has downloaded the neccessary data so that each python notebook or script in this project can be run.  The required prerequisites **must** be completed before proceeding further.  The optional prerequites are optional.  It will be noted below when it is required.

### Required

In [2]:
# Install all requirements to run everything in this project
!pip install -r requirements.txt

In [2]:
# This script downloads and propagates the datafiles required to run each script
%run download_output.py

Downloading...
From (uriginal): https://drive.google.com/uc?id=1C4sxd3439a6lAoK5X3K-CfE1WmhcjsH2
From (redirected): https://drive.google.com/uc?id=1C4sxd3439a6lAoK5X3K-CfE1WmhcjsH2&confirm=t&uuid=698b8c5d-3dba-4912-8a36-c60880a92ae3
To: /home/aidan/Git-Repositories/nlpclass-1231-g-the_3rd_times_the_charm-group-project/output.zip
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 137M/137M [00:17<00:00, 7.76MB/s]


### Optional

#### Configuring GPU for PyTorch and Tensorflow

Using the included requirements.txt file, this will download PyTorch and TensorFlow for CPU **only**.  If the user would like to use a GPU, we refer the reader to [PyTorch installation page](https://pytorch.org/get-started/locally/) or the [Tensorflow installation page](https://www.tensorflow.org/install/pip) so that they can download the appropriate GPU version.  Note, for PyTorch in this project we used version 2.0.0 and for Tensorflow we used version 2.12.0.

#### Download Models

In [4]:
# This script downloads the final models used in this project.  Note, depending on your internet speed this could take quite a while!
!python download_models.py

# Directories

The code in this project is organized into the following directories:
- bert
- lstm
- nyt
- result_visualizations
- training_data_formulation
- twitter
- ./

In the following cells, we give a brief description of what is contained in each subdirectory.  However, each directory contains a README.md that gives a detailed description of each script in the respective directory; we encourage the reader to read these.  For instance, if the user wishes to reproduce the evaluation (i.e. accuracies) of the BERT model, they should read `bert/README.md`.  This readme gives detailed documentation on how to run `test.ipynb` and discusses the output that is `bert_results_table.png` that shows the accuracy of the model.

### ./

This directory contains the (1) all sub directorires (see below) and (2)

### bert
This directory contains all the code to train and test the BERT model.  Please see `bert/README.md` for details on each script in the directory.

### lstm

All the code necessary for training and testing the LSTM model is included in this directory. For further information on each script in the directory, please refer to the `lstm/README.md` file located in the lstm directory.

### nyt

This directory contains all code used to scrape and visualize scraped data from the new york times.  Please see `nyt/README.md` for details on each script in the directory.

### result_visualizations

This directory contains all the produce the visualizations of the LSTM and BERT models applied on the scraped data.  Please see `result_visualizations/README.md` for details on each script in the directory.

### training_data_formulation
This directory contains all the code to clean and formulate the training data used in this project.  Please see `training_data_formulation/README.md` for details on each script in the directory.

### twitter

This directory contains all code used to scrape and visualize scraped data from Twitter.  Please see `twitter/README.md` for details on each script in the directory.