Skip to content
Sam-Nielsen-Dot edited this page Jun 26, 2021 · 5 revisions

Welcome to the depressionAnalysis wiki!


Classification

classify(text, mode="string", switchpoint=0.95, model_id=96, classifier=None)

Classifies a string as either "Positive" or "Negative for depression.

Modes

  • string - returns output as either "Positive" or "Negative"
  • int - returns output as either 1 for positive or 0 for negative
  • probabilities - returns a list of tuples containing the value and the associated probability

Switchpoint

A value between 0 and 1 above which the positive probability must be for it to register as positive

Model id

Id of the model used to classify the text. An int between 0 and 96

Refer to the chart below for the various datasets and accuracies each model was trained on. 31 is the default model as it performs consistently higher in real world application'

Model id 96 is the default model and is a version of model 31 trained with a larger dataset built on the same queries and rules.

Classifier

Provide a classifier object using the get_classifier(id). This saves on load time and should be used if classifying multiple items with the same model id.

Twitter API

Get a list of posts from a given username

def get_all_posts_for_user(user, limit=10)

Note: user is the @ of the twitter user and the amount of posts returned is limit * 10

Analyse a Twitter user for depressive likelihood

Goes through 100 of a users most recent twitter posts and returns a dictionary containing data regarding the likelihood of depression occuring in the user

def analyse_user(user, model_id=96, posts=None, switchpoint=0.8, classification_switchpoint=0.96, save_as=None, filename=None, encoding="utf-8")

  • user is the @ of the user being analysed

  • model_id is the model id (refer to above chart) used to classify the individual posts

  • posts is a list of strings like those generated by get_all_posts_for_user

  • switchpoint is a float between 0 and 1 which if the average positive likelihood of the user is above, the returning dictionary classifies them as depressed

  • classification_switchpoint serves the same function as switchpoint in the classify function

  • save_as is either "csv", "json" or "xlsx" and specifies the file type the data dictionary should be saved to, save_as=None means that no file will be generated

  • filename is the file the data should be written to, if save_as is not None and filename=None then filename will be automatically set to f"{user}_depression_statistics"

The dictionary that this function returns looks like:

` return_dict = { "model_id":model_id, "username":user, "total_posts":len(posts), "total_positive":total_positive, "total_negative":total_negative, "percent_positive":0, "percent_negative":0, "average_positive_likelihood":average_positive, "average_negative_likelihood":average_negative, "posts":[], "depressed":False, "switchpoint":switchpoint, "classification_switchpoint":classification_switchpoint

}

`

Average positive likelihood and average negative likelihood are floats which represent the average of all their respective likelihoods across every post. This data should be considered a rough estimate for the positive/negative likelihood of any post generated by the user.

Save a data dictionary

This function is used by analyse_user to save the resulting data dictionary to a file if save_as is not None

def save_dict(user, save_as, return_dict, filename=None, encoding="utf-8")

  • user is the @ of the user the data dict concerns

  • save_as and filename are the same as in analyse_user

  • return_dict is the data dictionary generated by analyse_user

Clone this wiki locally