Skip to content

CSE6242TEAM135/Nosleep-Recommender-System

Repository files navigation

Reddit/nosleep Recommender System

Working software can be viewed at

Important

To run a search you would need story IDs. The rationale is that if we need to hook this up with Reddit, the parameter passed to our system will be a story ID, which is unique. We have huge dataset of story IDs ran between 01/01/2019 and 07/01/2019 (Six months)
We are providing some story IDs, which you can use to test:
abg1pv , abg4dj, abg8cw, abgcly, abgd7w, abgfyo, abgjjn, abgyue, abgzs9, abhcls, abhgjs

PS: Complete list of story IDs can be found in storyids.txt here: https://github.com/CSE6242TEAM135/Nosleep-Recommender-System/blob/master/storyids.txt


The application is web based and assumes AWS infrastructure.

Aws components used • S3
• DynamoDB
• EC 2 (Virtual Machine – RedHat)

Following libraries need to be installed on the RedHat machine

• Python 3.7.4
• Django
• Pandas
• Boto3
• NLTK
• Wordcloud
• Plotly
• Networkx

To install any Python library use this syntax : python3 -m pip install --user plotly

In addition, AWS CLI must be installed and configured with appropriate Access keys , which will allow to communicate with DynamoDB

To install our software,

Log into EC2 and run these commands:
• git clone https://github.com/CSE6242TEAM135/Nosleep-Recommender-System.git
This will pull all the required files.
• Then type this command:
python3 Nosleep-Recommender-System/NoSleepRecommender_DJANGO/manage.py runserver 0.0.0.0:8000 &
It will start the server.
• Thereafter do
ctrl+a+d
This will continue running the server in the background and you can safely exit the CLI

Structure of Git:

Model folder contains our machine learning models, which includes Topic Modeling, Sentiment Analysis (using NLTK and Vader) and scoring methodology
NoSleepRecommender_DJANGO folder contains Django web server, Wordcloud and Network graph files
storyids.txt contains list of complete story ids that can be fetched from AWS.