# Example: Grouping

In [None]:
import pandas as pd

df = pd.read_csv (r"data.csv")
pd.set_option("display.max_columns", None)
df.drop_duplicates(inplace=True)

# Displays the count for the first 10 publishers
print("Count")
print("------------------------")
print(df.groupby("publisher").publisher.count().head(10))

# Displays the maximum price and rating for the first 10 publishers
print()
print()
print("Maximum")
print("------------------------")
print(df.groupby("publisher")[["price", "rating"]].max().head(10))

# Displays the min, max and sum of the values in the price column for the first 10 publishers
print()
print()
print("Aggregate (Min, Max, Sum)")
print("------------------------")
print(df.groupby("publisher").price.agg([min, max, sum]).head(10))

# Displays the sorted list of values in the price column for the last 3 publishers
print()
print()
print("Sorted List")
print("------------------------")
print(df.groupby("publisher").price.agg([sorted]).tail(4))

# Example: Sorting

In [None]:
import pandas as pd

df = pd.read_csv (r"data.csv")
pd.set_option("display.max_columns", None)
df.drop_duplicates(inplace=True)

# Sorts page_count values (increasing)
print("Page Counts (Ascending)")
print("------------------------")
print(df[["title", "page_count"]].sort_values(by="page_count").head())

# Sorts page_count values (decreasing)
print()
print()
print("Page Counts (Ascending = False)")
print("------------------------")
print(df[["title", "page_count"]].sort_values(by="page_count", ascending=False).head())

# Sorts by the rating (decreasing) and then by the author (alphabetically)
print()
print()
print("Highest Rated Authors (Alphabeticaly)")
print("------------------------")
print()
print(df[["author", "rating"]].sort_values(by=["rating", "author"], ascending=[False, True]).head())



# Problem 1 - Billionaire Sort

Using the Billionaires dataset, complete the following tasks:

1. Group the dataset by gender. Only display the gender count. What insights does this grouping give you?

2. Group the dataset by industry. Only display the industry count. What insights does this grouping give you?

3. Sort the billionaires by age. You will want the youngest person to be listed first. Only print the name and age columns. Only print the first five rows.

4. Sort by net worth. You will want the person with the highest net worth to be listed first. Only print the name and net_worth columns. Only print the first five rows.

5. Sort by age and then net worth. You will want the oldest person to be listed first. If there are two people with the same age, you’ll want the person with the highest net worth to be listed first. Only print the name, age, and net_worth columns. Only print the first TEN rows.

In [None]:
import pandas as pd

df = pd.read_csv (r"billionaire.csv")
pd.set_option("display.max_columns", None)

# Group results by gender 
# What insights does this grouping give you?
print("Grouped by Gender")
print("------------------------")


# Group results by industry 
# What insights does this grouping give you?
print()
print()
print("Grouped by Industry")
print("------------------------")


# Sort by age (increasing)
print()
print()
print("Age (Ascending)")
print("------------------------")


# Sort by net_worth (decreasing)
print()
print()
print("Net Worth (Ascending = False)")
print("------------------------")


# Sort by age (descreasing) and then net_worth (decreasing)
print()
print()
print("Age and Net Worth (Ascending = False)")
print("------------------------")


# Problem 2 - Sorting Cereal

Using the cereal production dataset, complete the following tasks:

1. Sort the countries by using the 1990 cereal production values. You will want the highest production values to be listed first. Only print the Country Name and 1990 columns. Only print the first five rows.

2. Let’s determine who would get the “Most Improved” award for a ten-year span.

    - Create a function that will return the difference between two years’ values.

    - Create a new column listing the growth from the year 1990 to the year 2000.

    - Sort by your new column. You will want the highest growth values to be listed first. Only print the Country Name and the new growth columns. Only print the first five rows.

3. Let’s compare growth!

    - Again, sort the 1990 cereal production values as you did earlier, only this time add in the new growth column.

    - How does the growth of the highest producing countries compare with the others? What might this tell you?

In [None]:
import pandas as pd

df = pd.read_csv (r"data.csv")
pd.set_option("display.max_columns", None)
df.drop_duplicates(inplace=True)

# Sort countries by using the 1990 cereal production values (increasing)
print("1990 Cereal Production (Ascending)")
print("------------------------")


# Create a function that will return the difference between two years values


# Create a new column listing the growth from year 1990 to year 2000    


# Sort by the growth of production from year 1990 to 2000 (increasing)
print()
print()
print("Highest growth from 1990 to 2000")
print("------------------------")


# Sort the 1990 cereal production values (increasing) and also list the growth from 1990 to 2000 (not sorted)
print()
print()
print("Ten year growth of highest 1990 production")
print("------------------------")


# Data Privacy
## How difficult is it to read and understand website and app privacy policies?
This graph, which shows the reading level and time (in minutes) required to read internet privacy policies for 150 popular websites and apps plus a few books, appeared elsewhere on NYTimes.com.

After looking closely at the graph, think about the three questions below. The questions are intended to build on one another, so try to answer them in order. Start with “I notice,” then “I wonder,” and end with a catchy headline.

1. What do you notice? If you make a claim, tell us what you noticed that supports your claim.

2. What do you wonder? What are you curious about that comes from what you notice in the graph?

3. What’s going on in this graph? Write a catchy headline that captures the graph’s main idea.

Source: Network, The Learning. “What’s Going On in This Graph? | Internet Privacy Policies.” The New York Times, The New York Times, 2 Jan. 2020, [https://www.nytimes.com/2020/01/02/learning/whats-going-on-in-this-graph-internet-privacy-policies.html](https://www.nytimes.com/2020/01/02/learning/whats-going-on-in-this-graph-internet-privacy-policies.html).

In [None]:
#answers go here

# What is Data Aggregation?
Read the following article while considering the following questions.

1. Why is data aggregation necessary?

2. How can data aggregation be helpful for businesses?

3. For your project, what industry are you going to explore? Jot down ways that data aggregation may help you discover trends and make changes?

Source:
Import.io. “What Is Data Aggregation? Examples of Data Aggregation by Industry.” Import.io, 22 Oct. 2019, [https://www.import.io/post/what-is-data-aggregation-industry-examples/](https://www.import.io/post/what-is-data-aggregation-industry-examples/).

In [None]:
#answer goes here