# Brainstation Flaredown Capstone: Project Summary

**Author:** Sarah Gates

**Date:** September 2022

**Contact:** sarahgates2015@u.northwestern.edu

# Table of Contents

* [1. Introduction](#intro-bullet)

    * [a. What is Flaredown?](#intro-bullet1)
    * [b. Original Data Format](#intro-bullet2)
    * [c. Capstone Target Question](#intro-bullet3)

    <br>

* [2. Cleaning, Filtering, and EDA: Round 1](#First-bullet)

    * [a. Exploration and Basic Cleaning](#First-bullet1)
        * [i. User Ids]()
        * [ii. Age]()
        * [ii. Sex]()
        * [iii. Country]()
        * [iv. Check-in Date]()
        * [v. Trackable ID]()
        * [vi. Trackable Type]()
        * [ vii. Trackable Name]()
        * [viii. Trackable Value]()
        
        <br>
        
    * [b. Initial Filtering and Entity Resolution]()
        * [i. Symptom and Condition Filtering]()
        * [ii. Manual Entity Resolution]()
        * [ii. Fibromyalgia and Non-fibromyalgia Split]()
        * [iii. Condition Selection]()
        * [iv. Symptom Selection]()
        * [v. Easy and Hard Data Sets]()

        <br>

    * [c. Aggregation and Feature Engineering]()
        * [i. Manual Entity Resolution ]()

        <br>

* [3. Modeling: Round 1](#Second-bullet)



# 1. Introduction

This notebook contains a concise summary of my BrainStation Flaredown Capstone project. Each portion of the project is discussed in detail, and the important findings are highlighted.





## a. What is Flaredown?<a class="anchor" id="intro-bullet1"></a>

Flaredown is an app that helps patients of chronic autoimmune and invisible illnesses improve their symptoms by avoiding triggers and evaluating their treatments. Each day, patients track their symptom severity, treatments and doses, and any potential environmental triggers (foods, stress, allergens, etc) they encounter.
About the data

Instead of coupling symptoms to a particular illness, Flaredown asks users to create their unique set of conditions, symptoms and treatments (“trackables”). They can then “check-in” each day and record the severity of symptoms and conditions, the doses of treatments, and “tag” the day with any unexpected environmental factors.

Here is the kaggle link for the dataset: https://www.kaggle.com/datasets/flaredown/flaredown-autoimmune-symptom-tracker

## b. Original Data Format<a class="anchor" id="intro-bullet2"></a>

The original data from kaggle contains the following features. Here is the description from the kaggle page:

| Data      | Description                                                                                                                                                                                                                                                                                                                                                  |
| --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| User      |  includes an ID, age, sex, and country                                                                                                                                                                                                                                                                                                                       |
| Condition | an illness or diagnosis, for example Rheumatoid Arthritis, rated on a scale of 0 (not active) to 4 (extremely active)                                                                                                                                                                                                                                        |
| Symptom   | self-explanatory, also rated on a 0–4 scale                                                                                                                                                                                                                                                                                                                  |
| Treatment | anything a patient uses to improve their symptoms, along with an  optional dose, which is a string that describes how much they took  during the day. For instance “3 x 5mg”                                                                                                                                                                                 |
| Tag       | a string representing an environmental factor that does not occur every day, for example “ate dairy” or “rainy day”                                                                                                                                                                                                                                          |
| Food      | food items were seeded from the publicly-available USDA food database. Users have also added many food items manually                                                                                                                                                                                                                                        |
| Weather   | weather is pulled automatically for the user's postal code from the Dark  Sky API. Weather parameters include a description, precipitation  intensity, humidity, pressure, and min/max temperatures for the day                                                                                                                                              |
| HBI       | the Harvey Bradshaw Index is a standardized metric to gauge the severity  of Crohn's disease specifically, often used in evaluation of therapies.  Patients with Crohn's disease who scored 3 or less on the HBI are very  likely to be in remission according to the CDAI. Patients with a score  of 8 to 9 or higher are considered to have severe disease |


If users do not see a symptom, treatment, tag, or food in our database (for instance “Abdominal Pain” as a symptom) they may add it by simply naming it. This means that the data requires some cleaning, but it is patient-centered and indicates their primary concerns.

**Data Format**

Below is an image of the original input dataframe, followed by a table with a description of each column:

![alt text](dataframe.png) 

| Column           | Description                                                                                                         |
| ---------------- | ------------------------------------------------------------------------------------------------------------------- |
| user_id          | Contains a unique string of characters indicating each unique user in the data set                                  |
| age              | A float value indicating the age of the user                                                                        |
| sex              | A string indicating the sex of the user (four possible: male, female, other, doesn't say)                           |
| country          | The abbreviated short form (e.g., GB or Great Britain) indicating the country of residence of the user              |
| checkin-date     | The date of entry for the row                                                                                       |
| trackable_id     | A unique id for the string entered in trackable_name                                                                |
| trackable_type   | The type of entry, one of seven possible strings (symptom, condition, food, treatment, weather, tag or HBI)         |
| trackable_name   | A string containing a description of the trackable_type (e.g., if Symptom, then could be 'pain in elbow')           |
| trackable_value  | An integer ranging from 0-4 indicating the severity of the particular entry, where applicable (0 is low, 4 is high) |

## c. Capstone Target Question<a class="anchor" id="intro-bullet3"></a>

The original 

# 2. Cleaning, Filtering, and EDA: Round 1

## a. Basic Data Cleaning

## b. 

# Modeling: Round 1