# Collecting Data

The foundation of your project will be to collect appropriate data for the problem you are trying to solve. There is incredible versatility in how you can use an machine learning model. Some examples might be to predict the next social media trend, categorizing cancer cells, trying to better understand the relationship between temperature fluctuations and coffee bean growth rates, or creating a chat bot.

Since we are very early in this course, and - most likely at this point - have not discussed many models, it may be difficult to know what you can do with a dataset, or what is in scope of this course. So by the nature of the course title "Exploring Machine Learning" we will take an explorative approach to your project. 

The goal of this part of the project is to explore what data sets you might be interested in, below will be questions to help guide you to selecting a category of data that you want to further explore.

## Identifying what data you want to explore

Data is everywhere and there seems to be data on about anything. You might know exactly what you want to dive deeper into or you might have no idea. Either way I invite you answer the questions below.

Below create a python dictionary with the key being a short summary of the topic of interest, while the value is an explanation of your interest - such as why you are interested in this topic, or why do you feel a strong passion to understand this topic. A topic of interest could be research you are conducting, a topic you are studying at your job, hobbies you have or topics surrounding your identities.

List 5 topics, and for each topic put at least a 50 word description.

For example I might put:

```python
interests = {
    "Cats" : "I have two cats at home, they are basically my children. I would generally like to learn more about cat behavior, health trends and pet owner behavior. It may be interesting to also see industry trends of cat owner, or how they compare to dog owners. Maybe later I want to start to write an app that recognizes cat breeds",

    "Scuba" : "The study of scuba diving seems to be a 'soft' science, and there are general guidelines on when and how long you should do safety stops to avoid getting decompression sickness. Could there be links to human anatomy or behavior on how deep a person should go safely during a dive?",
}
```

In [17]:
# Do not edit the name of this function, it will be used for grading
def what_are_topics_you_are_interested_in():
    interests = {
        "Cognitive Science" : "I have been interested in how people think and why they behave in various ways since I was little. \
    After my time in the Navy, I decided to pursue a medical degree. However, as I studied Biology/Chemistry/Physics, I gravitated \
    toward neuroscience, feeling called by the mysteries of the brain and a possible mind. To me, the fact that the 'lights are on,' \
    that I exist as a thinking thing in this world, is weirder and cooler than anything else. Thus to push human understanding of our \
    own internal system is a lifelong goal of mine.",

        "Video Games" : "Similar to my interest in the mind, my interest in video games started when I was a kid. I can remember staying up \
    past my bedtime to play Pokemon by lava lamp light. However, as much as I believe video games are a leisure activity, these games \
    may be the best methodology to study the mind. This virtual platform can potentially be a foundational task environment for \
    cognitive science of equal magnitude as the Drosophila for genetics. Given that the environments are completely replicable while \
    being dynamic and the user is stationary, insights into real-time strategic decision-making and problem-solving behaviors in humans \
    can likely be gleaned.",

        "Spacetime lines" : "My fascination with research lies in its ability to model the world's evolution through mathematics. \
    One of my favorite physicists Richard Feynman simplified complex particle interactions into spacetime diagrams. Like him, I aspire \
    to translate in-game behaviors into similar spacetime diagrams to deepen our understanding of the human mind, using game telemetry \
    and psychophysiological measurement tools. Other versions of these lines can be found in migration patterns, fluid dynamics, etc.",
    
        "Religions" : "The spiritual expression of the planet is something that I also find super interesting and important. Growing up in \
    the rural south, I was presented with various forms of Christianity to explain our moral code and reason for existence. However, again, \
    with age and curiosity, I found a hidden world. While I think the Christian tradition in aspects like ultimate utilitarianistic sacrifice \
    is beautiful. I also find the practices of Taoism, Buddhism, Hinduism, Islam, and Judaism enlightening. To paraphrase Alan Watts, \
    religions are fingers pointing at the moon, if you stare at the finger you'll miss the moon.",

        "Walking around in Umstead Park with my dog Ashe" : "My favorite thing about the Raleigh area is Umstead Park. Every \
    fall/winter/early-spring Ashe and I have gone exploring in the woods of Umstead. He is a blue heeler and obeys commands well, so together \
    we go exploring in the woods between the trails. We've seen several coyotes, a murder of thousands of crows, beavers, raccoons, orb weaver \
    spiders, and deer galore. I've collected about ten dropped deer antlers that I've turned into a candle holder. He and I will go out in the \
    mornings, he'll run and I'll listen to whatever philosophical YouTube video or book is peaking my interest. I've listened to entire lecture \
    series, classical novels, and music while romping around with Sir Thingaton Leopold the third Jr. Sr. esquire the viscount of Remington Place \
    (Ashe)."
    } # Fill out your interests
    return interests
# Note: you can use the \ symbol to continue your string to the next line, this makes 
# things look a bit prettier
# Example:
print("This is an \
      extended string ")

This is an       extended string 


## Do datasets exist for my interests?

There is lots of data out there but not for everything. Below are some websites where you can take a look at available datasets. Go ahead and search for datasets related to your topic. Are there many data sets surrounding your topic? Are there many different types of data like categorical, regression, images, etc? If there are limited data sets, do you feel comfortable with the challenge of creating your own data? (Note creating your own data set to supplement existing datasets will increase your score on this assignment)

> You can find a link to databases on the course page!

For 3 of your topics find 3 databases you might want to use for your project. Below create a dictionary with the keys being the topic values you listed above and the value a link of 3 data bases you would like to explore. If you would like to make your own data too make add a string "Create my own data" to the end of the list

Note, if you have trouble finding datasets for your topic you can make your dataset more general, or try a different topic. For example for my "Cats" topic I could expand it to "Pets", "Pet Toy Sales" or "Pet Health Benefits"

You can always change your topic and dataset later, so don't feel that these decisions are permanent.

While searching did it generate any ideas on interests or data sets you would like to explore? - If so you can add or replace a topic to the dictionary above!


Example"

```python
datasets = {
    "Cats" : ["https://www.kaggle.com/datasets/ma7555/cat-breeds-dataset", "https://example.com", "https://example.com", "Create my own data"],

    "Second Topic": ["https://example.com", "https://example.com", "https://example.com"],
    
    "Third Topic" : ["https://example.com", "https://example.com", "https://example.com"]
}
```


In [18]:
def find_some_datasets():
    datasets = {
        "Cognitive Science" : ["https://openneuro.org/", "http://www.humanconnectomeproject.org/", "https://reproducibility.stanford.edu/"],
        
        "Video Games" : ["https://corgis-edu.github.io/corgis/csv/video_games/", "https://www.kaggle.com/datasets", \
    "https://developer.riotgames.com/apis", "My own LoL research data from placebo experiment"],
        
        "Spacetime lines" : ["https://www.earthdata.nasa.gov/", "https://www.movebank.org/cms/movebank-main", "https://animove.org/"]
    }
    return datasets

## Asking questions about your dataset

Some questions you might want to ask for each dataset are:
- Who created this dataset?
- When was this dataset created?
- Could there be any biases when creating this dataset?
- How was this data collected?
- Is this data representative of the problem I am trying to solve?
