# Exploring the Reddit API

3: Exploration & questions
Usually when we work with clean, local datasets, we can start diving into exploring data and coming up with questions afterwards. Since Reddit isn't just a simple dataset that we can immediately start exploring, you'll need to explore Reddit, come up with interesing questions to ask, and then determine what data you'll need from the API.

To help get your creative juices flowing, here are some potentially interesting questions:

* How does the number of members in a subreddit affect how frequently new content is submitted?
* What's the distribution of the originating subreddits for each piece of content that makes it to the front page? How does this change hour by hour? Day by day? Etc.
* Which words or phrases show up in the submission titles the most for each subreddit?
* How effectively can you predict if a new content submission will hit the front page?
* How often do users re-submit old content that's been submitted before and which submission attempt does it fare better?

4: Data acquisition
After you've come up with some questions you're interested in answering, you'll need to establish a data acquisition strategy. Keep in mind that not every question will be able to be answered using the API. Questions involving historical data might be difficult in certain cases because of what the API provides access to. You may also need to incorporate other datasets! Here are some other things you'll want to keep in mind:

API limits: https://github.com/reddit/reddit/wiki/API#rules
API guidelines: https://www.reddit.com/wiki/licensing
Data acquisition timeline - API limits to 30 requests a minute
Data storage - store in files? a database?

5: Data transformation
The Reddit API returns JSON data, which may work fine for basic counting, but is cumbersome for running more advanced queries. The next important step is to write code that will transform the data into a more ideal format. If the question you're answering doesn't need much data or data acquisition time, you can convert the JSON data into a DataFrame. If the data acquisition time is outside the scope of an interactive data analysis session, it's best to transform the data into a tabular format like CSV. The decision is upto you!

6: Feature engineering
Depending on the question you're exploring, you may need to perform some further transformation of the data. If the data is stored in a DataFrame, you can quickly create new columns from existing columns that you think will be useful. In other cases, you may need to extract or calculate values from multiple datasets you've generated. This depends a lot on the questions you're trying to answer!

7: Data visualization
The next step is to start exploring the results of the data using visualization. If you're new to data visualization, we highly recommend our data visualization course. There are a wide range of plots included in Python data visualization libraries like histograms, box and whisker plots, and pair plots.