Skip to content

souravs17031999/reddit-flair-detection

Repository files navigation

reddit-flair-detection

The following repo contains scripts detecting flairs from reddit posts with real time hosted flask server.

Project Objective:

The objectives of this task is divided into five parts :

  • Part - I : Collecting and Building Reddit data
  • Part - II : Exploratory Data Analysis for Collected data
  • Part - III : Processing and Modelling using various Algorithms
  • Part - IV : Building WEB-APP
  • Part - V : Deploying WEB-APP on Heroku

Approach :

  • I first collected and builded Reddit India data by Scraping data using three methods :

    1. Using requests module
    2. Using PRAW wrapper module
    3. Using PushShift wrapper module
  • [Sample of data] :
    df1
    df2

  • Then i build my Data Intuition around the collected data analysing last four months data - Jan, Feb, Mar, Apr 2020 and digging various points out of them using various charts.

  • Then, analyszing comments of various threads , average intuitions.

  • Some of the samples of various charts and graphs are shown below :

  • January data :

df2

df2

  • February data :

df2

df2

  • March data

df2

df2

  • April data

df2

df2

  • Frequency of words occuring more often :

df2

  • Comments sentiments :

df2

  • Comments polarity :

df2

  • After that , i processed the text data using following :
  1. Applied CountVectorizer
  2. Applied TfidfVectorizer
  3. Constructed a Pipeline in scikit-learn for training and testing algorithms.
  • After that analyszed cross validated scores on testing samples :

df2

  • Build the webapp and created API for testing on POST request.
    Ex.
files = {'upload_file': open('file.txt','rb')}
r = requests.post('http://flair-reddit-predict.herokuapp.com/automated_testing', files=files)
  • Now deployment of WEB-APP

Real time Flask server hosted on Heroku:

  • Log on to following URL hosted on pythonanywhere.com using flask server :

Getting started :

FLASK solution :

  • Run the cmd (terminal).
  • Download the project files using following command in the directory from where you need to run the script :
git clone https://github.com/souravs17031999/reddit-flair-detection   
  • Move to the project main directory where the project is downloaded.
  • Move to directory reddit_app.
  • Now run following :
    (for windows)

set FLASK_APP=app.py
python -m flask run

(for other termials)

$ export FLASK_APP=object.py
$ flask run

Other troubleshooting issues related to flask server

  • Now the local server should start, log on to : [local url port] shown on terminal.
    (Most probably it will be http://127.0.0.1:5000/ , or maybe any other port)

Navigating Project Directory :

  • File containing Data collection and building - Notebook1
  • File containing EDA - Notebook2
  • File containing Modelling - Notebook3
  • Scripts for Flask APP - WebApp
  • Training Data - Data
  • Other data for analysing - OtherData
  • Pre-trained Model - Model
  • Experiment documented log contains errors and solutions - logs

Sample of WebApp in action :

df2

⭐️ this Project if you liked it !