To run a search you would need story IDs. The rationale is that if we need to hook this up with Reddit, the parameter passed to our system will be a story ID, which is unique. We have huge dataset of story IDs ran between 01/01/2019 and 07/01/2019 (Six months)
We are providing some story IDs, which you can use to test:
abg1pv , abg4dj, abg8cw, abgcly, abgd7w, abgfyo, abgjjn, abgyue, abgzs9, abhcls, abhgjs
PS: Complete list of story IDs can be found in storyids.txt here: https://github.com/CSE6242TEAM135/Nosleep-Recommender-System/blob/master/storyids.txt
Aws components used
• S3
• DynamoDB
• EC 2 (Virtual Machine – RedHat)
• Python 3.7.4
• Django
• Pandas
• Boto3
• NLTK
• Wordcloud
• Plotly
• Networkx
In addition, AWS CLI must be installed and configured with appropriate Access keys , which will allow to communicate with DynamoDB
Log into EC2 and run these commands:
• git clone https://github.com/CSE6242TEAM135/Nosleep-Recommender-System.git
This will pull all the required files.
• Then type this command:
python3 Nosleep-Recommender-System/NoSleepRecommender_DJANGO/manage.py runserver 0.0.0.0:8000 &
It will start the server.
• Thereafter do
ctrl+a+d
This will continue running the server in the background and you can safely exit the CLI
Model folder contains our machine learning models, which includes Topic Modeling, Sentiment Analysis (using NLTK and Vader) and scoring methodology
NoSleepRecommender_DJANGO folder contains Django web server, Wordcloud and Network graph files
storyids.txt contains list of complete story ids that can be fetched from AWS.