-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the depressionAnalysis wiki!
Classification
classify(text, mode="string", switchpoint=0.95, model_id=96, classifier=None)
Classifies a string as either "Positive" or "Negative for depression.
Modes
- string - returns output as either "Positive" or "Negative"
- int - returns output as either 1 for positive or 0 for negative
- probabilities - returns a list of tuples containing the value and the associated probability
Switchpoint
A value between 0 and 1 above which the positive probability must be for it to register as positive
Model id
Id of the model used to classify the text. An int between 0 and 96
Refer to the chart below for the various datasets and accuracies each model was trained on. 31 is the default model as it performs consistently higher in real world application'
Model id 96 is the default model and is a version of model 31 trained with a larger dataset built on the same queries and rules.
Classifier
Provide a classifier object using the get_classifier(id). This saves on load time and should be used if classifying multiple items with the same model id.
Twitter API
Get a list of posts from a given username
def get_all_posts_for_user(user, limit=10)
Note: user is the @ of the twitter user and the amount of posts returned is limit * 10
Analyse a Twitter user for depressive likelihood
Goes through 100 of a users most recent twitter posts and returns a dictionary containing data regarding the likelihood of depression occuring in the user
def analyse_user(user, model_id=96, posts=None, switchpoint=0.8, classification_switchpoint=0.96, save_as=None, filename=None, encoding="utf-8")
-
user is the @ of the user being analysed
-
model_id is the model id (refer to above chart) used to classify the individual posts
-
posts is a list of strings like those generated by get_all_posts_for_user
-
switchpoint is a float between 0 and 1 which if the average positive likelihood of the user is above, the returning dictionary classifies them as depressed
-
classification_switchpoint serves the same function as switchpoint in the classify function
-
save_as is either "csv", "json" or "xlsx" and specifies the file type the data dictionary should be saved to, save_as=None means that no file will be generated
-
filename is the file the data should be written to, if save_as is not None and filename=None then filename will be automatically set to
f"{user}_depression_statistics"
The dictionary that this function returns looks like:
` return_dict = { "model_id":model_id, "username":user, "total_posts":len(posts), "total_positive":total_positive, "total_negative":total_negative, "percent_positive":0, "percent_negative":0, "average_positive_likelihood":average_positive, "average_negative_likelihood":average_negative, "posts":[], "depressed":False, "switchpoint":switchpoint, "classification_switchpoint":classification_switchpoint
}
`
Average positive likelihood and average negative likelihood are floats which represent the average of all their respective likelihoods across every post. This data should be considered a rough estimate for the positive/negative likelihood of any post generated by the user.
Save a data dictionary
This function is used by analyse_user to save the resulting data dictionary to a file if save_as is not None
def save_dict(user, save_as, return_dict, filename=None, encoding="utf-8")
-
user is the @ of the user the data dict concerns
-
save_as and filename are the same as in analyse_user
-
return_dict is the data dictionary generated by analyse_user
