## OVERVIEW OF OSEMiN

<img src='https://raw.githubusercontent.com/jirvingphd/fsds_100719_cohort_notes/master/images/OSEMN.png' width=800>

<center><a href="https://www.kdnuggets.com/2018/02/data-science-command-line-book-exploring-data.html"> 
    </a></center>


> <font size=2em>The Data Science Process we'll be using during this section--OSEMiN (pronounced "OH-sum", rhymes with "possum").  This is the most straightforward of the Data Science Processes discussed so far.  **Note that during this process, just like the others, the stages often blur together.***  It is completely acceptable (and ***often a best practice!) to float back and forth** between stages as you learn new things about your problem, dataset, requirements, etc.  
It's quite common to get to the modeling step and realize that you need to scrub your data a bit more or engineer a different feature and jump back to the "Scrub" stage, or go all the way back to the "Obtain" stage when you realize your current data isn't sufficient to solve this problem. 
As with any of these frameworks, *OSEMiN is meant to be treated as guidelines, not law. 
</font>


### OSEMN DETAILS

**OBTAIN**

- This step involves understanding stakeholder requirements, gathering information on the problem, and finally sourcing data that we think will be necessary for solving this problem. 

**SCRUB**

- During this stage, we'll focus on preprocessing our data.  Important steps such as identifying and removing null values, dealing with outliers, normalizing data, and feature engineering/feature selection are handled around this stage.  The line with this stage really blurs with the _Explore_ stage, as it is common to only realize that certain columns require cleaning or preprocessing as a result of the visualzations and explorations done during Step 3.  

- Note that although technically, categorical data should be one-hot encoded during this step, in practice, it's usually done after data exploration.  This is because it is much less time-consuming to visualize and explore a few columns containing categorical data than it is to explore many different dummy columns that have been one-hot encoded. 

**EXPLORE**

- This step focuses on getting to know the dataset you're working with. As mentioned above, this step tends to blend with the _Scrub_ step mentioned above.  During this step, you'll create visualizations to really get a feel for your dataset.  You'll focus on things such as understanding the distribution of different columns, checking for multicollinearity, and other tasks liek that.  If your project is a classification task, you may check the balance of the different classes in your dataset.  If your problem is a regression task, you may check that the dataset meets the assumptions necessary for a regression task.  

- At the end of this step, you should have a dataset ready for modeling that you've thoroughly explored and are extremely familiar with.  

**MODEL**

- This step, as with the last two frameworks, is also pretty self-explanatory. It consists of building and tuning models using all the tools you have in your data science toolbox.  In practice, this often means defining a threshold for success, selecting machine learning algorithms to test on the project, and tuning the ones that show promise to try and increase your results.  As with the other stages, it is both common and accepted to realize something, jump back to a previous stage like _Scrub_ or _Explore_, and make some changes to see how it affects the model.  

**iNTERPRET**

- During this step, you'll interpret the results of your model(s), and communicate results to stakeholders.  As with the other frameworks, communication is incredibily important! During this stage, you may come to realize that further investigation is needed, or more data.  That's totally fine--figure out what's needed, go get it, and start the process over! If your results are satisfactory to all stakeholders involved, you may also go from this stage right into productionizing your model and automating processes necessary to support it.  





## PROCESS CHECKLIST


> Keep in mind that it is normal to jump between the OSEMN phases and some of them will blend together, like SCRUB and EXPLORE.

1. **[OBTAIN](#OBTAIN)**
    - Import data, inspect, check for datatypes to convert and null values
    - Display header and info.
    - Drop any unneeded columns, if known (`df.drop(['col1','col2'],axis=1,inplace=True`)
    <br><br>


2. **[SCRUB](#SCRUB)**
    - Recast data types, identify outliers, check for multicollinearity, normalize data**
    - Check and cast data types
        - [ ] Check for #'s that are store as objects (`df.info()`,`df.describe()`)
            - when converting to #'s, look for odd values (like many 0's), or strings that can't be converted.
            - Decide how to deal weird/null values (`df.unique()`, `df.isna().sum()`)
            - `df.fillna(subset=['col_with_nulls'],'fill_value')`, `df.replace()`
        - [ ] Check for categorical variables stored as integers.
            - May be easier to tell when you make a scatter plotm or `pd.plotting.scatter_matrix()`
            
    - [ ] Check for missing values  (df.isna().sum())
        - Can drop rows or colums
        - For missing numeric data with median or bin/convert to categorical
        - For missing categorical data: make NaN own category OR replace with most common category
    - [ ] Check for multicollinearity
        - Use seaborn to make correlation matrix plot 
        - Good rule of thumb is anything over 0.75 corr is high, remove the variable that has the most correl with the largest # of variables
    - [ ] Normalize data (may want to do after some exploring)
        - Most popular is Z-scoring (but won't fix skew) 
        - Can log-transform to fix skewed data
    
    
3. **[EXPLORE](#EXPLORE)**
    - [ ] Check distributions, outliers, etc**
    - [ ] Check scales, ranges (df.describe())
    - [ ] Check histograms to get an idea of distributions (df.hist()) and data transformations to perform.
        - Can also do kernel density estimates
    - [ ] Use scatter plots to check for linearity and possible categorical variables (`df.plot("x","y")`)
        - categoricals will look like vertical lines
    - [ ] Use `pd.plotting.scatter_matrix(df)` to visualize possible relationships
    - [ ] Check for linearity.
   
   
4. **[MODEL](#MODEL)**

    - **Fit an initial model:** 
        - Run an initial model and get results

    - **Holdout validation / Train/test split**
        - use sklearn `train_test_split`
    
    
5. **[iNTERPRET](#iNTERPRET)**
    - **Assessing the model:**
        - Assess parameters (slope,intercept)
        - Check if the model explains the variation in the data (RMSE, F, R_square)
        - *Are the coeffs, slopes, intercepts in appropriate units?*
        - *Whats the impact of collinearity? Can we ignore?*
        <br><br>
    - **Revise the fitted model**
        - Multicollinearity is big issue for lin regression and cannot fully remove it
        - Use the predictive ability of model to test it (like R2 and RMSE)
        - Check for missed non-linearity
        
       
6. **Interpret final model and draw >=3 conclusions and recommendations from dataset**

<div style="display:block;border-bottom:solid red 3px;padding:1.4em;color:red;font-size:30pt;display:inline-block;line-height:1.5em;">
DELETE THIS CELL AND EVERYTHING ABOVE FROM YOUR FINAL NOTEBOOK
</div>

# Final Project Submission

Please fill out:
* Student name: 
* Student pace: self paced / part time / full time:
* Scheduled project review date/time: 
* Instructor name: 
* Blog post URL:
* Video of 5-min Non-Technical Presentation:

## TABLE OF CONTENTS 

*Click to jump to matching Markdown Header.*<br><br>

<font size=3rem>
    
- **[Introduction](#INTRODUCTION)<br>**
- **[OBTAIN](#OBTAIN)**<br>
- **[SCRUB](#SCRUB)**<br>
- **[EXPLORE](#EXPLORE)**<br>
- **[MODEL](#MODEL)**<br>
- **[iNTERPRET](#iNTERPRET)**<br>
- **[Conclusions/Recommendations](#CONCLUSIONS-&-RECOMMENDATIONS)<br>**
</font>
___

# INTRODUCTION

> Explain the point of your project and what question you are trying to answer with your modeling.



# OBTAIN

In [86]:
# Import all packages to be used
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
import glob

import nltk
from nltk import FreqDist,word_tokenize, regexp_tokenize, TweetTokenizer
from nltk.corpus import stopwords
from nltk import FreqDist
from nltk.stem.wordnet import WordNetLemmatizer

import string

from wordcloud import WordCloud

%matplotlib inline

# Set random seed
np.random.seed(27)

In [2]:
pd.options.display.max_colwidth = None

In [5]:
path = r'data'
all_files = glob.glob(os.path.join(path, "*.csv"))
all_files

['data/stoic.csv',
 'data/shine-calm-anxiety-stress.csv',
 'data/moodfit.csv',
 'data/slumber-fall-asleep-insomnia.csv',
 'data/pzizz-sleep-nap-focus.csv',
 'data/headspace-meditation-sleep.csv',
 'data/breathe2relax.csv',
 'data/fabulous-daily-routine-planner.csv',
 'data/app-name-id.csv',
 'data/breethe-meditation-sleep.csv',
 'data/talkspace-therapy-counseling.csv',
 'data/smiling-mind.csv',
 'data/calm.csv',
 'data/rootd-panic-attack-relief.csv',
 'data/reflectly-journal-ai-diary.csv',
 'data/whats-up-a-mental-health-app.csv',
 'data/betterme-calm-sleep-meditate.csv',
 'data/relax-meditation-guided-mind.csv',
 'data/insight-timer-meditation-app.csv',
 'data/cbt-i-coach.csv',
 'data/meditopia-meditation-breathe.csv',
 'data/moodtools-depression-aid.csv',
 'data/moodmission.csv',
 'data/noisli.csv',
 'data/mindshift-cbt-anxiety-relief.csv',
 'data/minddoc-your-companion.csv',
 'data/ten-percent-happier-meditation.csv',
 'data/innerhour-self-care-therapy.csv',
 'data/sanvello-anxiety-

In [9]:
all_files.pop(-2)
all_files.pop(8)
all_files

['data/stoic.csv',
 'data/shine-calm-anxiety-stress.csv',
 'data/moodfit.csv',
 'data/slumber-fall-asleep-insomnia.csv',
 'data/pzizz-sleep-nap-focus.csv',
 'data/headspace-meditation-sleep.csv',
 'data/breathe2relax.csv',
 'data/fabulous-daily-routine-planner.csv',
 'data/breethe-meditation-sleep.csv',
 'data/talkspace-therapy-counseling.csv',
 'data/smiling-mind.csv',
 'data/calm.csv',
 'data/rootd-panic-attack-relief.csv',
 'data/reflectly-journal-ai-diary.csv',
 'data/whats-up-a-mental-health-app.csv',
 'data/betterme-calm-sleep-meditate.csv',
 'data/relax-meditation-guided-mind.csv',
 'data/insight-timer-meditation-app.csv',
 'data/cbt-i-coach.csv',
 'data/meditopia-meditation-breathe.csv',
 'data/moodtools-depression-aid.csv',
 'data/moodmission.csv',
 'data/noisli.csv',
 'data/mindshift-cbt-anxiety-relief.csv',
 'data/minddoc-your-companion.csv',
 'data/ten-percent-happier-meditation.csv',
 'data/innerhour-self-care-therapy.csv',
 'data/sanvello-anxiety-depression.csv',
 'data/h

In [10]:
df_list = []

for filename in all_files:
    temp_df = pd.read_csv(filename)
    df_list.append(temp_df)

In [11]:
df = pd.concat(df_list, axis=0, ignore_index=True)
df

Unnamed: 0,rating,date,title,isEdited,review,userName,app_name,app_id,developerResponse
0,5,2020-05-15 21:12:51,A non judgmental guide to a better self,False,"This app just keeps getting better and better. While using stoicism as a framework, this app has really been a way for me to practice personal reflection and awareness, cultivating better emotional wellness. Easily accessible techniques for gratitude, journaling, meditation, and breathing exercises without the pressure of any one thing being a requirement. The app guides you to those tools through a pleasant sequence without any pressure to use specific techniques. It’s so gentle. If you want to journal, then journal. If you want to meditate, then do that. And if you don’t, that’s ok too. It’s totally up to you. I find tremendous value in the simple act of making these sequences a part of my daily morning and evening routine; from simply opening the app and assessing how I feel to what my priority for the day is to how productive I felt. The app now even guides personal reflection towards specific categories. If in the morning I say I’m going to focus on reading, in the evening the app will then guide me to reflect on recalling what I read that day. So simple, but so helpful. Excellent user experience, an essential part of my every day routine now.",CheeseNinjaIPA98,stoic,1312926037,
1,3,2021-04-20 12:43:19,Maybe I don't know how to use this app?,False,"I'm not really sure how I feel about this app. While it does ask you questions/give you prompts to get you talking about things, sometimes the questions are weird. I had one this morning, somewhere along the lines of ""what are you then, a mere body? a ration being."" And I'm staring at my screen like how do I even answer this if I don't understand the question? There's other ones that are similar and because I don't understand what it's asking me to write, sometimes I'll just put ""lol"" so I can move to the next page. Then when you're done with your questions it will ask if you feel better and mostly my answer has been no because the questions it gave me were weird. You can pick what questions you'd like to be asked everyday but none of them really jump out to me as something I'd want to answer daily besides the one sentence summary of your day but even that is whatever. I do like that it tracks your mood, would be cool to have another tracker for what is causing your mood instead of the sad to happy face scale. The quotes are cool too. I've been using the app for about 2 weeks now and I feel like maybe twice out of the 14 days, it's actually asked me something relevant. More often than not, I'm using the blank page entries instead but I could just write those in my actual journal. Perhaps this app isn't for me, but it seems to work for people.",Jay2DaNaay,stoic,1312926037,
2,3,2020-10-31 00:11:52,Good For Simple Journaling,False,"I am enjoying the free version of the app, (so this is coming from that point of view.) The daily prompts are repetitive but the paid version promises more “reflective questions” and “quotes”. (I wonder, how different could it be?) I do like the daily “choose your own/write your own ”mantra/saying/quote/etc.” There are quotes/reflections to use as journal prompts, if you just feel like typing. The daily mood/goal prompts kind of deter me away from using the app. It almost seems like a chore, mostly because not every answer for the “mood/goal” question warrants the next prompt which is usually, the exact same question about what we plan to do about such-and-such from the previous answers. Everything is predictable. Most of the time I just find myself lumping my thoughts into the closest category, or rushing by multiple prompts to get to the writing part and find I am uninspired. This has app has potential but I feel there are two different ideas clashing here. I love inspiration when I want to write something, but if I want to journal purely for self-reflection and self/improvement I’d like a little more depth, less out-of-context quotes from other people. Either way, there was a lot of work put into this, and a need for it too!",SilenceRecited,stoic,1312926037,
3,5,2021-04-03 14:27:52,yuhhhhhh,False,"I downloaded this app last week and it’s been on my home page since then. I’ve only started using this app a few days ago and OMG😳 This app is so beautiful. I’m only writing a review just so I can ask if you guys can put a password for the journal part because my mom looks through my phone and I don’t want her to read the things I write in there💀 Anyways, I do actually really love this app and its perfect from the bottom of those codes and to the top of the icon. It’s iconic honestly. Not to forget the fact that it’s been lifting burdens and weights off of my back when I type everything in details. It’s like a friend that I’ve always wanted, which is sad because none of my real (alive) friends are there for me and asks me how I am like how stoic is🥲 Ever since the app asked me the first question, I’ve poured allllll of the things I wanted to say and in details too. I- this app is like that one friend that you have that’s always there for you no matter what and genuinely cares for you so deeply and that you’re comfortable being around them even though you’ve only met last week. yknow? this app is my twin flame.",kytedot,stoic,1312926037,
4,5,2020-11-24 01:18:05,This is an amazing app,False,"I typically do not like journaling on my phone however, I stumbled upon this app and gave it a try and I absolutely love it. I wasn’t sure if I wanted to buy a membership but I did because I wanted to have full access to the entire app to be able to really see if it was worth it. So far, I have used the app in the morning and evening. I wanted to be able to track my mood to get a good sense of what really bothers me and be able to journal. I find that it really helps me find peace and be able to find happy moments in my day when it is hard to find sometimes. I love being able to find ways to be grateful and this really helps me find those moments in the day when you’ve been so stressed and all you can think of is the negative parts and then turning it around to find those moments that stand out that are positive. Even during times of stress I have been able to see the positive. I’ve been able to take a step back and really look at each situation as a learning experience and sometimes it is hard to do when you are battling depression, anxiety and PTSD. Definitely recommend this app!!",Chryssee,stoic,1312926037,
...,...,...,...,...,...,...,...,...,...
28930,5,2020-07-22 18:55:03,J’adore!,False,"Vraiment très bien faite, avec des programmes top. Ça fait 3 ans que je ne peux plus m’en passer!",thewinewifey,mindfulness-with-petit-bambou,941222646,
28931,5,2020-07-18 03:00:01,Simplemente genial!,False,"Maravillosa aplicación, llena de ayudas y acompañamiento para mantenerte “enganchado”. La mejor manera de aprender a meditar! Felicitaciones",Pepeclas,mindfulness-with-petit-bambou,941222646,
28932,5,2020-06-05 22:38:39,Muy buena,False,"Me encanta esta aplicación, es genial como te ayuda a canalizar las meditaciones. Es didáctica amena y sobre todo muy amorosa su manera de canalizar las meditaciones \nPara los que como yo estamos comenzando es maravilloso como poco a poco te va adentrando a un mundo marsvilloso. Súper recomendada",martika11,mindfulness-with-petit-bambou,941222646,
28933,5,2020-04-08 11:56:58,Gracias,False,Descubriendo esta herramienta,Pericantar,mindfulness-with-petit-bambou,941222646,


In [46]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 28935 entries, 0 to 28934
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   rating             28935 non-null  int64 
 1   date               28935 non-null  object
 2   title              28935 non-null  object
 3   isEdited           28935 non-null  bool  
 4   review             28934 non-null  object
 5   userName           28935 non-null  object
 6   app_name           28935 non-null  object
 7   app_id             28935 non-null  int64 
 8   developerResponse  4183 non-null   object
dtypes: bool(1), int64(2), object(6)
memory usage: 1.8+ MB


In [47]:
df[df['review'].isna()]

Unnamed: 0,rating,date,title,isEdited,review,userName,app_name,app_id,developerResponse
11120,4,2021-01-23 13:54:34,Trying to remove my review after request for refund was accepted,True,,Xwave500,talkspace-therapy-counseling,661829386,"{'id': 20485547, 'body': 'Hi there, we are so sorry to hear of your less than optimal experience. Someone from our team would love to look into what happened and do what we can to make this right by you. Please send us an email at Feedback@Talkspace.com, we look forward to hearing from you. ', 'modified': '2021-01-18T20:54:59Z'}"


In [49]:
df = df[df['review'].isna()==False]

In [50]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 28934 entries, 0 to 28934
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   rating             28934 non-null  int64 
 1   date               28934 non-null  object
 2   title              28934 non-null  object
 3   isEdited           28934 non-null  bool  
 4   review             28934 non-null  object
 5   userName           28934 non-null  object
 6   app_name           28934 non-null  object
 7   app_id             28934 non-null  int64 
 8   developerResponse  4182 non-null   object
dtypes: bool(1), int64(2), object(6)
memory usage: 2.0+ MB


In [52]:
df.describe()

Unnamed: 0,rating,app_id
count,28934.0,28934.0
mean,4.217599,882918600.0
std,1.368784,346939200.0
min,1.0,337472900.0
25%,4.0,571800800.0
50%,5.0,992210200.0
75%,5.0,1203637000.0
max,5.0,1363010000.0


In [53]:
df[df['isEdited']==True]

Unnamed: 0,rating,date,title,isEdited,review,userName,app_name,app_id,developerResponse
11,5,2021-02-10 13:08:43,High Recommend!!,True,"I’ve written a couple app reviews for this app before, but I just want to put in that after using it for about three months now, I must say that stoic is quite a wonderful app! It gives you all sorts of options for things to do that really help my everyday mentality, like breathing excersize, meditations, and fear settings/thought excersize. A high recommend! I’d say my only issue with this app is probably that the morning routines are SO long. Honestly I don’t mind doing them, but I just wish there were options to choose the amount of questions asked. People are busy, not everyone has 30 minutes each morning to answer like 10 questions. Perhaps five or six? Also I like the meditation options they have, I only wish they had a few guided ones. That’s only a request, not a complaint though! My first point was a complaint.",Raimabird,stoic,1312926037,"{'id': 20357398, 'body': 'could you check what hour you have in ‚your profile’ -> personalize -> start day at?\n\nbest,\nm.', 'modified': '2021-01-12T22:23:35Z'}"
20,5,2020-04-11 13:38:29,Im confused,True,"I’ve had it for one day and it’s already messing up. I like the app so far I really do but it is telling me that it’s still Thursday(yesterday) but today is Friday. It might be because I didn’t do my “evening routine but I did it today and still wouldn’t let me go to the next day. I’m confused and don’t know what to do, I tried deleting and re-downloading it but it didn’t work other than just giving me the same day and have to redo it over again. Someone help me please.\nEdit: Never mind I got it to fix itself after quite a bit of time. I do really like the app so far after one day of use but yeah. But also thank you for the response that makes it much easier for me and now it’ll always be on the right day.",wolfgirl💙🌑🐺,stoic,1312926037,"{'id': 14596171, 'body': 'hey,\n\nmy apologies for your trouble. could you open ‘you tab’ -> personalize day -> day start -> pick 12am?\n\nbest,\nm.', 'modified': '2020-04-10T19:52:32Z'}"
62,4,2021-05-18 03:27:09,Excellent Customer Service,True,"When an issue arose (and the issue was mine alone), the developer responded immediately and assisted me greatly. In the event other incorrectly associate this site with anything related to Ryan Holiday, as I did, they are not connected. \n\nOne app bug I do wonder about is why the daily count randomly restarts at one. Also, thought it might be cool if there was a way to print the journals (perhaps even commercially or bound- all or as a selection). Perhaps you can and I’m just not great with tech. Anyway…. Thank you for the email and I am sorry for the incorrect association.",Mark50998926,stoic,1312926037,"{'id': 22856122, 'body': 'hej Mark,\n\nwe are not associated with mr. Holiday in any way (besides the fact that i personally admire his work)\n\nbest,\nmaciej\n\n', 'modified': '2021-05-17T15:50:36Z'}"
249,5,2020-05-05 07:02:30,Love it,True,I’ve been using this for a while and it’s just a great mix of good habits.\n\nOne thing I’d love for the morning questions is the ability to add a free form question or something like: “What value do you aim to grow today?”.\n\nThanks for the great work!,porcoesphino,stoic,1312926037,"{'id': 12623477, 'body': 'uh, you are right, there was a bug. thank you for letting me know.\n\ngood idea about the ‘nature 🌱’ tag. let me know if you have in mind more improvements, happy to implement them :)\n\nbest,\nmaciej', 'modified': '2019-12-28T10:20:49Z'}"
299,3,2020-07-24 14:28:38,Still no fix for toggling questions?,True,I complained before about being unable to toggle off the questions without being prompted to subscribe. Was told there’d be a fix in the next update. At least two updates and still no fix?,AllTheNicksWereTaken,stoic,1312926037,"{'id': 16666136, 'body': 'woops, my mistake. you should be able to turn off questions without a premium sub.\nmy apologies, we will fix that in the next update.\n\nm.\n\n—\n\nyeah, my apologies. i understood from your previous message that we have a bug when toggling the switch.\nbut the fact that you can see the premium wall there is not a bug. the basic options of the app are free, personalization is paid.\n\nm.', 'modified': '2020-07-16T08:21:20Z'}"
...,...,...,...,...,...,...,...,...,...
27421,5,2020-11-04 22:58:55,great!,True,"i LOVE this app it gives me the satisfaction that other apps like this haven’t. it’s so helpful and makes me feel as if i’ve talked with a real therapist. i just have one problem, there’s to many blocked thing because of premium. i don’t want to have to pay to be relaxed. why should you have to pay for an app that’s supposed to help you and make people happy? other than that its great!!",Madison M V.,youper-self-guided-therapy,1060691513,"{'id': 17129276, 'body': 'Hello Madison! Thank you for your positive feedback! We are pleased that you are satisfied with the application! We do our best to keep a useful set of tools free for users who are unable to subscribe, but ultimately, our subscription prices allow us to continue growing the AI capabilities and tools. Feel free to send any questions or feedback to youper.ai/support', 'modified': '2020-08-06T21:07:23Z'}"
27496,5,2021-01-24 12:04:40,Love but doesn’t load all the time,True,"Update: they updated the app, it works when I need it. The recommendations have gotten better and more personal. I use this app all the time. \n\nI love this app. But half the time I go to use it and it never loads. Seems kind of backwards. If it loaded and opened when I needed it it would have 5 stars.",Jenni RRT,youper-self-guided-therapy,1060691513,"{'id': 13474003, 'body': 'Thanks for bringing this to our attention. Your comments are very helpful to us as we are always working to improve Youper and get rid of those annoying bugs ;-). Please access https://www.youper.ai/request to give us more details so we can have this fixed as soon as possible. Remember to mention the device and operating system version.', 'modified': '2020-02-13T20:56:28Z'}"
27514,1,2021-06-20 06:53:21,So sad,True,"I used to be able to track my mood without unlocking everything and now it’s you have to pay money to do that. I get that they need the money, but I don’t have the money to pay for it. It was nice to be able to at least track my mood. Now I’m sad. Update: the mood tracker came back but now it’s gone again and I can’t find it. I hate it when they change things so much, you can’t find anything and it just makes my stress… Update again: I found the mood tracker and it’s LOCKED. Now I can’t do ANYTHING without paying money…",NovellaRose,youper-self-guided-therapy,1060691513,"{'id': 21621602, 'body': 'Hi, thanks for your feedback. You can still access mood tracking. To make the access easier you can favorite to track your mood.\n\nGo to Talk - Search - ""feeling"" - Click on the exercise - Click on the Heart icon to favorite\n\nTalk and Listen favorite exercises are available in the top filter ""My ❤""', 'modified': '2021-03-12T21:39:04Z'}"
27720,5,2020-10-03 22:03:25,self reflection helper,True,"its so helpful to self reflect and see my own patterns of thought. it’s something that has helped me reach so many realizations about my mental health and wellbeing. my only wish is for OCD and eating disorder monitoring, but I trust that will come in time :)",hunnybeest,youper-self-guided-therapy,1060691513,"{'id': 16936546, 'body': ""Thanks for your feedback!! It's great to know that Youper is providing the support you need! Take care!"", 'modified': '2020-07-28T19:18:37Z'}"


In [56]:
df['rating'].value_counts(normalize=True)

5    0.691954
1    0.112117
4    0.098846
3    0.056162
2    0.040921
Name: rating, dtype: float64

In [57]:
df['isEdited'].value_counts(normalize=True)

False    0.991049
True     0.008951
Name: isEdited, dtype: float64

In [58]:
df['app_name'].value_counts(normalize=True)

fabulous-daily-routine-planner    0.173187
reflectly-journal-ai-diary        0.172911
calm                              0.172842
headspace-meditation-sleep        0.105827
insight-timer-meditation-app      0.103684
youper-self-guided-therapy        0.057787
talkspace-therapy-counseling      0.031797
ten-percent-happier-meditation    0.031382
slumber-fall-asleep-insomnia      0.023986
sanvello-anxiety-depression       0.019735
stoic                             0.019009
breethe-meditation-sleep          0.018801
shine-calm-anxiety-stress         0.015622
minddoc-your-companion            0.014619
betterme-calm-sleep-meditate      0.006912
meditopia-meditation-breathe      0.006117
relax-meditation-guided-mind      0.005184
rootd-panic-attack-relief         0.005046
moodfit                           0.002938
pzizz-sleep-nap-focus             0.002730
mindshift-cbt-anxiety-relief      0.002454
smiling-mind                      0.002281
happify-for-stress-worry          0.001279
mindfulness

In [63]:
df[df['developerResponse'].isna()==False]

Unnamed: 0,rating,date,title,isEdited,review,userName,app_name,app_id,developerResponse
9,5,2021-01-25 20:15:12,Absolutely Incredible - Better than Others,False,"Absolutely incredible. The focus breathing, the journaling, and the adaptability to each user. I have never seen such an app before. I have downloaded over 20 journaling apps, but I eventually always delete the app because I forget to journal because the app doesn’t help me. Now, this journaling app is incredible. I write 3-5 times a day and use its focus breathing exercises so much. This is the only app I want to actually support with money even though I am a stingy person.\n\n——\n\nUpdate: I do dislike the new feature where it asks you to think negatively to become grateful without being able to decline like last update. For those who are suffering depression, anxiety, and other mental issues, this is a big black hole that one can go into. I plead the developers to lock this feature away by being able to decline before the app asks you that question.",NemoTheCat,stoic,1312926037,"{'id': 20686361, 'body': 'hey,\n\ncould you please ping me at m@stoicroutine.com so we could find the issue with forcing negative visualization? \n\nthere should be an option to skip it but i worry we messed it up.\n\nbest,\nmaciej', 'modified': '2021-01-27T12:52:23Z'}"
11,5,2021-02-10 13:08:43,High Recommend!!,True,"I’ve written a couple app reviews for this app before, but I just want to put in that after using it for about three months now, I must say that stoic is quite a wonderful app! It gives you all sorts of options for things to do that really help my everyday mentality, like breathing excersize, meditations, and fear settings/thought excersize. A high recommend! I’d say my only issue with this app is probably that the morning routines are SO long. Honestly I don’t mind doing them, but I just wish there were options to choose the amount of questions asked. People are busy, not everyone has 30 minutes each morning to answer like 10 questions. Perhaps five or six? Also I like the meditation options they have, I only wish they had a few guided ones. That’s only a request, not a complaint though! My first point was a complaint.",Raimabird,stoic,1312926037,"{'id': 20357398, 'body': 'could you check what hour you have in ‚your profile’ -> personalize -> start day at?\n\nbest,\nm.', 'modified': '2021-01-12T22:23:35Z'}"
20,5,2020-04-11 13:38:29,Im confused,True,"I’ve had it for one day and it’s already messing up. I like the app so far I really do but it is telling me that it’s still Thursday(yesterday) but today is Friday. It might be because I didn’t do my “evening routine but I did it today and still wouldn’t let me go to the next day. I’m confused and don’t know what to do, I tried deleting and re-downloading it but it didn’t work other than just giving me the same day and have to redo it over again. Someone help me please.\nEdit: Never mind I got it to fix itself after quite a bit of time. I do really like the app so far after one day of use but yeah. But also thank you for the response that makes it much easier for me and now it’ll always be on the right day.",wolfgirl💙🌑🐺,stoic,1312926037,"{'id': 14596171, 'body': 'hey,\n\nmy apologies for your trouble. could you open ‘you tab’ -> personalize day -> day start -> pick 12am?\n\nbest,\nm.', 'modified': '2020-04-10T19:52:32Z'}"
43,1,2020-01-20 19:32:37,Subscription based app masquerading as freemium.,False,"I thought I’d give stoic a try. It sounded interesting. Unfortunately there is no free version (despite setup telling you that you can unlock all the features - implying an unlocked subset). You can get a seven day trial which automatically converts to withdrawing funds from your account. \n\nI have a brain injury that causes me not to function well, and I’ve learned not to allow these (through trial and mostly error) as I will forget about the app, forget when I downloaded it if I do remember, and/or forget to cancel if I’m not getting value. Therefore, I have no way to evaluate the app, even though it may be beneficial for me....\n\nAdditionally, the subscription model, while a great way to fund continued development, is extremely limiting for those of us with limited income due to disability.",BenjPhoto,stoic,1312926037,"{'id': 13029866, 'body': 'hi.\n\njust use the app without a subscription :) there is a free version, and it won’t limit the functionalities.\n\nbest,\nm.', 'modified': '2020-01-21T05:20:34Z'}"
58,3,2021-01-28 00:33:31,Used to have more options,False,"When I first downloaded this app I loved it! It provided a variety of options to cope with my emotions on a day to day basis and I loved the different journal templates to start a journaling session and the way it would use stoicism & thinking trap meditations to help me through especially rough days. However, there must’ve been an update because I no longer have access to the wide range of journal prompts and the ones I do have layer on top of one another instead of allowing for a “at your own pace” kind of app. I’m not sure if I’m ready to abandon ship yet, but I definitely am not getting out of it what I used to.",Peacify,stoic,1312926037,"{'id': 20751864, 'body': 'hmm interesting. we have only moved guided journals to a different place (main today view -> journal).\n\nthis change will give us space to create even more guided experiences :) previous scroll inside journaling was kind of clunky.\n\nbest,\nm.', 'modified': '2021-01-30T22:22:34Z'}"
...,...,...,...,...,...,...,...,...,...
28910,5,2021-06-08 09:47:36,Skeptic turned fan,False,"I had gotten progressively open to trying out the whole meditation thing. I downloaded a couple of apps and ended up feeling like a dummy, which I was, I just didn’t want to acknowledge it I guess. Petit bambou gave me the space to try toe-dipping then wading in it and eventually sneaking off to go full on swimming with backstrokes et al in it. How did I become this addicted? Lawd !",Yo-Ed,mindfulness-with-petit-bambou,941222646,"{'id': 23326269, 'body': ""Hi ! Thank you so much for sharing with us your experience with Petit BamBou. We're so proud to be able to accompany you on your meditation journey... Our goal is to make this awesome tool that is meditation, accessible to everyone. It's truly amazing to read that it worked for you. Don't hesitate to get in touch with us if you have any question : help@petitbambou.com :) "", 'modified': '2021-06-11T09:50:18Z'}"
28916,5,2020-05-12 02:03:40,Great app to relax,False,I listened to the méditations and they were great and relaxing . The voice was soothing and the message was passed Another feature was the sounds I loved them specifically under the water,cbennus,mindfulness-with-petit-bambou,941222646,"{'id': 15729431, 'body': 'Great to hear that you not only like the meditations, but also the sounds! All the best to you! :-)', 'modified': '2020-06-02T14:27:42Z'}"
28917,5,2020-02-18 14:22:16,Indispensable!!,False,"Very good and clear design application that guides you into very interesting meditations. Available in many languages, this meditation app offers the best of the best, in many ways.",Diamond SkyWest,mindfulness-with-petit-bambou,941222646,"{'id': 13662836, 'body': ""Wow - we're proud to hear that, thank you! Wishing you all the best and happy meditating!"", 'modified': '2020-02-24T09:32:41Z'}"
28920,5,2020-05-28 21:54:37,Great App!,False,"I highly recommend this app! I like the analogies and the lessons. It has helped me already and I’m looking forward to future lessons. Thanks, BamBou! :)",Chati 11,mindfulness-with-petit-bambou,941222646,"{'id': 15729394, 'body': ""You're very welcome and we're really happy to hear that! Wishing you all the best!"", 'modified': '2020-06-02T14:26:34Z'}"


In [66]:
df[df.duplicated(keep=False)]

Unnamed: 0,rating,date,title,isEdited,review,userName,app_name,app_id,developerResponse
24872,5,2020-03-03 17:19:53,Good karma,False,Good karma,raj2ray,insight-timer-meditation-app,337472899,
24880,5,2021-04-16 17:11:10,Majestuoso,False,Agradecida,samagandu,insight-timer-meditation-app,337472899,
24897,5,2021-04-16 17:11:10,Majestuoso,False,Agradecida,samagandu,insight-timer-meditation-app,337472899,
24918,5,2020-03-03 17:19:53,Good karma,False,Good karma,raj2ray,insight-timer-meditation-app,337472899,


In [83]:
df = df.drop_duplicates()

In [84]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 28932 entries, 0 to 28934
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   rating             28932 non-null  int64 
 1   date               28932 non-null  object
 2   title              28932 non-null  object
 3   isEdited           28932 non-null  bool  
 4   review             28932 non-null  object
 5   userName           28932 non-null  object
 6   app_name           28932 non-null  object
 7   app_id             28932 non-null  int64 
 8   developerResponse  4182 non-null   object
dtypes: bool(1), int64(2), object(6)
memory usage: 2.0+ MB


In [82]:
df[df.duplicated(['userName', 'app_name'], keep=False)]

Unnamed: 0,rating,date,title,isEdited,review,userName,app_name,app_id,developerResponse
25366,4,2021-01-24 18:03:56,"Worth it, but with a flaw for me",False,"I found MindDoc (previously Moodpath) when I was struggling with my mental health and wanted to track my emotions and journal my thoughts throughout the day. This app is perfect for that. \n\nCurrently, the free version asks you a few questions 3x a day, allows you to record your emotional state and any thoughts, and gives you a report every two weeks. \n\nWhat appeals to me is the streamlined nature of the app. It has additional content for those interested, but for me, the free version fufills all that I need it to.\n\nThe only flaw is that I had data for about 6 months before they changed the name and structure of the app, resulting in the loss of all my journal enteries. Unfortunately, they were unable to retrieve the data due to the way the app was structured before. Other than this flaw, I really enjoy this app and would recommend it to others.",Hobbit of the Shire,minddoc-your-companion,1052216403,
25395,2,2021-01-19 21:15:26,"Love app, but lost data!",False,"I’ve really enjoyed using this app. It’s very streamlined, just a mood tracker and journal. Perfect!\n\nUnfortunately, when Moodpath switched to MindDoc, I got logged out (forgot the password) and lost my data. It won’t send me an email to reset it. I’m seeing other reviewers experiencing the same problem. I’ve already contacted the support email with my issue and still waiting on a response. \n\nI’ll update this review to five stars when it gets fixed because it’s worked great up till this point. Thank you!",Hobbit of the Shire,minddoc-your-companion,1052216403,"{'id': 20551815, 'body': 'Hello, thank you for reaching out. We are very sorry you lost your data! We will contact you via mail as soon as we manage to bring it back. Our team is working hard to solve this inconvenient problem. Best wishes and thank you for your patience, your MindDoc team', 'modified': '2021-01-21T11:43:06Z'}"


In [85]:
lemmatizer = WordNetLemmatizer()

NameError: name 'WordNetLemmatizer' is not defined

# SCRUB

# EXPLORE

# MODEL

# iNTERPRET

# CONCLUSIONS & RECOMMENDATIONS

> Summarize your conclusions and bullet-point your list of recommendations, which are based on your modeling results.