New Products

You are given a table of product launches by company by year. 

Write a query to count the net difference between the number of products companies launched in 2020 with the number of products companies launched in the previous year. 

Output the name of the companies and a net difference of net products released for 2020 compared to the previous year.

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime

In [3]:
car_launches = pd.read_csv("../CSV/car_launches.csv")
car_launches = car_launches.iloc[:, :3]
car_launches.head()

Unnamed: 0,year,company_name,product_name
0,2019,Toyota,Avalon
1,2019,Toyota,Camry
2,2020,Toyota,Corolla
3,2019,Honda,Accord
4,2019,Honda,Passport


In [4]:
df_2020 = car_launches[car_launches['year'].astype(str) == '2020']
df_2020

Unnamed: 0,year,company_name,product_name
2,2020,Toyota,Corolla
6,2020,Honda,Pilot
8,2020,Chevrolet,Trailblazer
9,2020,Chevrolet,Trax
11,2020,Chevrolet,Blazer
13,2020,Ford,Aspire
15,2020,Jeep,Wrangler
16,2020,Jeep,Cherokee
17,2020,Jeep,Compass


In [5]:
df_2019 = car_launches[car_launches['year'].astype(str) == '2019']
df_2019

Unnamed: 0,year,company_name,product_name
0,2019,Toyota,Avalon
1,2019,Toyota,Camry
3,2019,Honda,Accord
4,2019,Honda,Passport
5,2019,Honda,CR-V
7,2019,Honda,Civic
10,2019,Chevrolet,Traverse
12,2019,Ford,Figo
14,2019,Ford,Endeavour
18,2019,Jeep,Renegade


Этот код выполняет объединение двух DataFrame (`df_2020` и `df_2019`) по столбцу 'company_name' с использованием внешнего объединения (outer join). Давайте разберем код по шагам:

1. `pd.merge(...)`: Вызывает функцию объединения DataFrame.

2. `df_2020, df_2019`: Это два DataFrame, которые будут объединены.

3. `how='outer'`: Указывает тип объединения. В данном случае, используется внешнее объединение (outer join), которое включает в себя все строки из обоих DataFrame, заполняя пропущенные значения NaN, если соответствующих данных нет в одном из DataFrame.

4. `on=['company_name']`: Указывает столбец, по которому происходит объединение. В данном случае, это 'company_name'.

5. `suffixes=['_2020', '_2019']`: Добавляет суффиксы к названиям столбцов из обоих DataFrame, чтобы отличить их друг от друга. Например, если в обоих DataFrame есть столбец 'column1', после объединения они будут переименованы в 'column1_2020' и 'column1_2019'.

6. `.fillna(0)`: Заполняет пропущенные значения (NaN), которые могут возникнуть после объединения, нулевыми значениями.

Таким образом, в результате выполнения этого кода получится новый DataFrame `df`, который содержит данные из обоих исходных DataFrame, объединенные по столбцу 'company_name'.

In [10]:
df = pd.merge(df_2020, df_2019, how='outer', on=[
    'company_name'], suffixes=['_2020', '_2019']).fillna(0)
df

Unnamed: 0,year_2020,company_name,product_name_2020,year_2019,product_name_2019
0,2020,Chevrolet,Trailblazer,2019,Traverse
1,2020,Chevrolet,Trax,2019,Traverse
2,2020,Chevrolet,Blazer,2019,Traverse
3,2020,Ford,Aspire,2019,Figo
4,2020,Ford,Aspire,2019,Endeavour
5,2020,Honda,Pilot,2019,Accord
6,2020,Honda,Pilot,2019,Passport
7,2020,Honda,Pilot,2019,CR-V
8,2020,Honda,Pilot,2019,Civic
9,2020,Jeep,Wrangler,2019,Renegade


In [11]:
df = df[df['product_name_2020'] != df['product_name_2019']]
df

Unnamed: 0,year_2020,company_name,product_name_2020,year_2019,product_name_2019
0,2020,Chevrolet,Trailblazer,2019,Traverse
1,2020,Chevrolet,Trax,2019,Traverse
2,2020,Chevrolet,Blazer,2019,Traverse
3,2020,Ford,Aspire,2019,Figo
4,2020,Ford,Aspire,2019,Endeavour
5,2020,Honda,Pilot,2019,Accord
6,2020,Honda,Pilot,2019,Passport
7,2020,Honda,Pilot,2019,CR-V
8,2020,Honda,Pilot,2019,Civic
9,2020,Jeep,Wrangler,2019,Renegade


In [12]:
df = df.groupby(['company_name']).agg(
    {'product_name_2020': 'nunique', 'product_name_2019': 'nunique'}).reset_index()
df

Unnamed: 0,company_name,product_name_2020,product_name_2019
0,Chevrolet,3,1
1,Ford,1,2
2,Honda,1,4
3,Jeep,3,2
4,Toyota,1,2


In [13]:
df['net_new_products'] = df['product_name_2020'] - df['product_name_2019']
df

Unnamed: 0,company_name,product_name_2020,product_name_2019,net_new_products
0,Chevrolet,3,1,2
1,Ford,1,2,-1
2,Honda,1,4,-3
3,Jeep,3,2,1
4,Toyota,1,2,-1


In [14]:
result = df[['company_name', 'net_new_products']]

In [15]:
result

Unnamed: 0,company_name,net_new_products
0,Chevrolet,2
1,Ford,-1
2,Honda,-3
3,Jeep,1
4,Toyota,-1


Solution Walkthrough
In this problem, we are given a table of product launches by company by year. Our goal is to count the net difference between the number of products companies launched in 2020 compared to the previous year. We need to output the name of the companies and the net difference of the net products released for 2020 compared to the previous year.

To solve this problem, we can use pandas library in Python. The given code snippet performs the necessary operations to achieve this.

Let's go through the code step-by-step to understand how it solves the problem.

Understanding The Data
The data is presented as a table of product launches by company by year. It contains the following columns:

company_name: The name of the company
year: The year of the product launch
product_name: The name of the product launched
The Problem Statement
We are tasked with counting the net difference between the number of products companies launched in 2020 compared to the previous year. We need to output the name of the companies and the net difference of the net products released for 2020 compared to the previous year.

Breaking Down The Code
Let's break down the code snippet given in the question:

import pandas as pd
import numpy as np
from datetime import datetime

df_2020 = car_launches[car_launches["year"].astype(str) == "2020"]
df_2019 = car_launches[car_launches["year"].astype(str) == "2019"]
Here, we import the necessary libraries and create two dataframes df_2020 and df_2019. These dataframes contain the rows from the car_launches dataframe where the year column is equal to 2020 and 2019, respectively.

df = pd.merge(
    df_2020,
    df_2019,
    how="outer",
    on=["company_name"],
    suffixes=["_2020", "_2019"],
).fillna(0)
Next, we merge the df_2020 and df_2019 dataframes on the column 'company_name' using an outer join. This creates a new dataframe df with columns from both the 2020 and 2019 dataframes.

df = df[df["product_name_2020"] != df["product_name_2019"]]
We filter the df dataframe to include only those rows where the product names in 2020 are not equal to the product names in 2019. This gives us the rows where new products were launched in 2020.

df = (
    df.groupby(["company_name"])
    .agg(
        {
            "product_name_2020": "nunique",
            "product_name_2019": "nunique",
        }
    )
    .reset_index()
)
We group the filtered dataframe df by 'company_name' and calculate the number of unique product names for 2020 and 2019 using the nunique function. The result is stored in a new dataframe df.

df["net_new_products"] = (
    df["product_name_2020"] - df["product_name_2019"]
)
result = df[["company_name", "net_new_products"]]
We calculate the net difference by subtracting the number of products launched in 2019 from the number of products launched in 2020. The result is stored in a new column 'net_new_products' in the df dataframe. We then select only the 'company_name' and 'net_new_products' columns and assign it to a new dataframe result.

Bringing It All Together
The final code snippet gives us the desired result by counting the net difference between the number of products launched in 2020 compared to the previous year for each company. The resulting dataframe result contains the 'company_name' and the net difference in the 'net_new_products' column.

Conclusion
The code provided solves the problem of counting the net difference in product launches between 2020 and the previous year. It uses pandas library in Python to perform the necessary operations on the given data.