# Subreddit Identification function ('Shower Thought' or 'Crazy Idea'?) 

The second best performing model (Naive Bayes with three features) is used in this function in the interest of time. The best performing model (Boosting Classifier with three features) has a much longer run time for each prediction

In [30]:
def subreddit_id(post):
    import pandas as pd
    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.compose import make_column_transformer
    # from sklearn.ensemble import GradientBoostingClassifier 
    from sklearn.pipeline import make_pipeline
    from sklearn.naive_bayes import MultinomialNB

    df_subreddits = pd.read_csv('./00_cleaned_data/subreddits.csv')
    X = df_subreddits[['full_text', 'letter_count', 'word_count']]
    y = df_subreddits['subreddit']

    cv = CountVectorizer(max_df=0.8, min_df=0, ngram_range=(1,2))
    text_vectorizer = make_column_transformer((cv, 'full_text'), remainder='passthrough', n_jobs=-1, verbose_feature_names_out=False)
    nb = MultinomialNB(alpha=0.1)
    pipe = make_pipeline(text_vectorizer, nb)
    pipe.fit(X, y)
    X_post = pd.DataFrame([[ post, len(post), len(post.split(' ')) ]], columns=['full_text', 'letter_count', 'word_count'])
    return print(pipe.predict(X_post))
    

# Testing function

Subreddit posts from today (never seen by model) are used to test the function
<br>
(https://www.reddit.com/r/Showerthoughts/, https://www.reddit.com/r/CrazyIdeas/)

In [40]:
#Correctly identified!

subreddit_id('whenever a court undoes a law by saying its unconstitutional, the government owes reparations to everyone who that law caused problems for')

['crazyideas']


In [41]:
#Correctly identified!

subreddit_id('On days when the zoo is closed, take one of the animals on a tour to see th other animals. Then give them a free T-shirt.')

['crazyideas']


In [42]:
#Correctly identified!

subreddit_id('Rename the Asian and European halve of Istanbul to Eastanbul and Westanbul')

['crazyideas']


In [43]:
#Correctly identified!

subreddit_id('A golf course inside a heavily wooded area. Do you think you\'re a big shot just because you can hit a ball across an open field? Up the ante by playing 18 holes in a forest!')

['crazyideas']


In [44]:
#Correctly identified!

subreddit_id('Have bungie release a PS$ exclusive Halo adjacent game. With the purchase of bungie by playstation. I\'m thinking they could make a Halo-like shooter. They could call it schmalo, the protag would be called mister shcief. The setting is on a schmalo octagon fighting against the schmovenant.')

['crazyideas']


In [38]:
#Correctly identified!

subreddit_id('A 36-year-old has spent 18 years as an adult, so they are an adult adult')

['showerthoughts']


In [39]:
#Incorrectly identified (actually from Showerthoughts)

subreddit_id('Every year wall-e becomes more and more realistic.')

['crazyideas']


In [45]:
#Incorrectly identified (actually from Showerthoughts)

subreddit_id('people who are depressed because of their financial struggles can\'t get therapy to help because of how expensive it is')

['crazyideas']


In [46]:
#Correctly identified!

subreddit_id('If your fridge is full of beer you look like an alcoholic, but an alcoholic wouldn\'t have any in their fridge. Edit: seems to be based on culture you grew up in. If you grew up where there were 3 places to get liquor for every McDonald\'s then you probably don\'t stock a lot. If you grew up in the country or in the city where it was a hassle to go get it you tend to stock up a lot of you are an alcoholic. Thanks for the responses everyone!!')

['showerthoughts']


In [47]:
#Correctly identified!

subreddit_id('You can tell how bad inflation has been over the years by how much money people steal in heist movies of their times.')

['showerthoughts']


In [48]:
#Correctly identified!

subreddit_id('More than a six pack of beer in the fridge starts to look like a drinking problem, but a whole wine cellar is considered very classy')

['showerthoughts']
