This project provides a Flask API that leverages the BERT language model to perform sentiment analysis. Additionally, it features a web scraping component that collects headlines from Yahoo Finance and analyzes their sentiment. If interested in seeing a live website that does the features below, visit the model on HuggingFace here. The deployed model on HuggingFace utilizes Gradio to deploy a front end application with our trained model.
-
Sentiment Analysis
- Utilizes the BERT model to analyze the sentiment of customized input text.
- Supports three sentiment categories: neutral, negative, and positive.
- Returns the sentiment of the input text to the user.
-
Yahoo Finance Web Scraping
- Scrapes headlines from Yahoo Finance using BeautifulSoup.
- Analyzes the sentiment of each headline using the BERT model.
- Returns a list of headlines with their corresponding sentiment in JSON format.
- sentiment-analysis.py: the sentiment analysis model, including training and evaluation
- app.py: the main application file, responsible for running the sentiment analysis back end
- index.html: front end of the application
- dataset-cleaning.py: code for cleaning and preparing the pre-existing darasets
- plot.py: plotting the visualization of our model's results
- requirements.txt: project dependenices
- training_metrics.pkl: training results for loss and accuracy
- /datasets: raw and combined datasets
- /plots: configured visualization of our model's results
- /saved_model: config file for the model (Note: the model is saved on HuggingFace due to GitHub size restrictions)
- /saved_tokenizer: tokenizer files for our model
Ensure you have the following installed to run the front end application
- Python 3.7+
- numpy < 2
- Flask
- torch
- transformers
- beautifulsoup4
- requests
Note that for this to work, you will need to add the saved model from Hugging Face to the saved_model folder.
-
Clone the repository:
git clone https://github.com/your-username/sentiment-analysis-bert.git
-
Navigate to the project directory:
cd sentiment-analysis-bert -
Install the required packages:
pip install -r requirements.txt
-
Run the Flask API:
python3 app.py
-
Open the index.html file using a live server. You can do this by opening the project repository in VSCode, right clicking on 'index.html' and clicking 'Open with Live Server'.
The BERT model and tokenizer are saved in the saved_model and saved_tokenizer directories, respectively. You can update these files by retraining the model and tokenizer with your own dataset.
Contributions are welcome! If you would like to contribute to this project, please fork the repository and submit a pull request.
This project uses the following libraries and resources:
- Hugging Face Transformers for the BERT model and tokenizer
- BeautifulSoup for web scraping
- Yahoo Finance for providing the headlines data
- Sentiment Analysis with BERT Using Hugging Face
The sentiment analysis models were trained using and combining the following datasets: