![Image](https://sawaed19.net/wp-content/uploads/2021/01/700600p546EDNmainimg-process-change-management1.jpg)

[Image Source](https://sawaed19.net/en/event/workshop-youth-for-change/)

# Introduction

Hey, thanks for viewing my Kernel!

If you like my work, please, leave an upvote: it will be really appreciated and it will motivate me in offering more content to the Kaggle community ! 😊

The objective of this notebook is to explore the changes in data science over the years. Therefore, we worked on two differents dataset that are [kaggle_survey_2021](https://www.kaggle.com/c/kaggle-survey-2021) and [kaggle_survey_2020](https://www.kaggle.com/c/kaggle-survey-2020/overview). Kaggle survey 2020 has 39+ questions, 20,036 responses and survey 2021 has 42+ questions, 25,973 responses. 

The notebook consists of 5 parts.
1. Introduction
2. Data Preparation
3. Data Cleaning
4. Data Analysis
5. Conclusion

In the introduction part, we started simply importing libraries and datasets.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings


warnings.simplefilter("ignore")
sns.set()

df21 = pd.read_csv('../input/kaggle-survey-2021/kaggle_survey_2021_responses.csv')
df20 = pd.read_csv('../input/kaggle-survey-2020/kaggle_survey_2020_responses.csv')

In [None]:
print(df20.shape, df21.shape)

In [None]:
print(df20.columns)
print(df21.columns)

# Data Preparation

In this part, we created 3 functions that are used for simplification the datasets. 

In the datasets, some questions have more than one column and function **group_cols** is used for grouping the questions. For example, Q24 is one group, and Q12_Part_1, Q12_Part_2, Q12_Part_3, Q12_OTHER are also one group. 

Function **part_cols_convert** is written for converting the questions that have more than one column to one column. For instance, this function converts Q12_Part_1, Q12_Part_2, Q12_Part_3, Q12_OTHER to Q12 column. 

The last function is **dict_preparation** that is used for matching the same question in 2020 and 2021. Of course in the datasets, some questions mean are the same but the questions are different. We solved that kind of problem with manual correction. For example, Q12 is "Which types of specialized hardware do you use on a regular basis?  (Select all that apply) - Selected Choice - GPUs" in 2020 and "Which types of specialized hardware do you use on a regular basis?  (Select all that apply) - Selected Choice -  NVIDIA GPUs" in 2021

After all preparation, we combined  survey 2020 and 2021 by function **prepare_data**.

In [None]:
def group_cols(df):
    cols = df.columns
    
    col_part = []
    for col in cols:
        if '_' in col:
            col_part.append(col)
    
    cols_1 = list(set(cols) - set(col_part))
    
    temp_df = pd.DataFrame(col_part)
    temp_df['question'] = temp_df[0].str.split('_').str[0]
    temp_group = temp_df.groupby('question')[0]
    
    cols_2 = []
    for name, group in temp_group:
        if len(group) > 1:
            cols_2.append(list(group.values))
    
    return list(cols_1 + cols_2)

In [None]:
def part_cols_convert(df):
    cols = df.columns
    
    col_part = []
    for col in cols:
        if '_' in col:
            col_part.append(col)
    
    temp_df = pd.DataFrame(col_part)
    temp_df['question'] = temp_df[0].str.split('_').str[0]
    temp_group = temp_df.groupby('question')[0]
    
    cols_2 = []
    for name, group in temp_group:
        if len(group) > 1:
            cols_2.append(list(group.values))
    
    part_df_list = []
    for cols in cols_2:
        part_df = pd.DataFrame()
        new_col = cols[0].split('_')[0]
        
        values_list = []
        for col in cols:
            str_value = df.loc[0, col].split('-')[-1]
            count_num = df[col].value_counts()[0]
            values = [str_value for i in range(count_num)]
            values_list.extend(values)
        
        part_df[new_col] = values_list
        part_df_list.append(part_df)
    
    df_parts = pd.concat(part_df_list, 1)
    return df_parts

In [None]:
def dict_preparation(question_2020, question_2021, df20, df21):
    same_questions_dict = {}
    question_mean_dict = {}

    for c_20 in question_2020:
        if type(c_20) is list:
            c_20 = c_20[0]
            question_mean_dict[c_20.split('_')[0]] = df20.loc[0, c_20]
            #print('c_20:', c_20 , df20.loc[0, c_20])
        else:
            question_mean_dict[c_20] = df20.loc[0, c_20]
        q_20 = df20.loc[0, c_20]

        for c_21 in question_2021:
            if type(c_21) is list:
                c_21 = c_21[0]
            q_21 = df21.loc[0, c_21]
            #print('c_21:', c_21, q_21)
            if q_21 == q_20:
                if '_' in c_20:
                    if '_' in c_21:
                        same_questions_dict[c_20.split('_')[0]] = c_21.split('_')[0]
                    else:
                        same_questions_dict[c_20.split('_')[0]] = c_21
                else:
                    if '_' in c_21:
                        same_questions_dict[c_20] = c_21.split('_')[0]
                    else:
                        same_questions_dict[c_20] = c_21
                break
    return same_questions_dict, question_mean_dict

In [None]:
df20_parts = part_cols_convert(df20)
df21_parts = part_cols_convert(df21)

question_2020 = group_cols(df20)
question_2021 = group_cols(df21)

same_questions_dict, question_mean_dict = dict_preparation(question_2020, question_2021, df20, df21)

print(same_questions_dict)
print(question_mean_dict)

In [None]:
print(question_2020)
print(question_2021)

In [None]:
from termcolor import colored

diff_questions_20_list = ['Q12_Part_1', 'Q27_A_Part_1', 'Q27_B_Part_1', 'Q28_A_Part_1', 'Q28_B_Part_1', 'Q36_Part_1']
diff_questions_21_list = ['Q12_Part_1', 'Q27_A_Part_1', 'Q27_B_Part_1', 'Q28', 'Q36_A_Part_1', 'Q36_B_Part_1']
more_questions_list = ['Q40_Part_1', 'Q41', 'Q42_Part_1']

print(colored('1) ', 'green'), df20.loc[0, 'Q12_Part_1'], ' - ', df21.loc[0, 'Q12_Part_1'])
print(colored('2) ', 'green'), df20.loc[0, 'Q27_A_Part_1'], ' - ', df21.loc[0, 'Q27_A_Part_1'])
print(colored('3) ', 'green'), df20.loc[0, 'Q27_B_Part_1'], ' - ', df21.loc[0, 'Q27_B_Part_1'])

print(colored('3.5) ', 'green'), df20.loc[0, 'Q26_A_Part_1'], ' - ', df21.loc[0, 'Q27_A_Part_1'])

print(colored('4) ', 'red'), df20.loc[0, 'Q28_A_Part_1'], ' - ', df21.loc[0, 'Q28'])
print(colored('5) ', 'red'), df20.loc[0, 'Q28_B_Part_1'], ' - ', df21.loc[0, 'Q28'])
print(colored('6) ', 'red'), df20.loc[0, 'Q36_Part_1'], ' - ', df21.loc[0, 'Q36_A_Part_1'])
print(colored('7) ', 'red'), df20.loc[0, 'Q36_Part_1'], ' - ', df21.loc[0, 'Q36_B_Part_1'])

print(colored('8) ', 'blue'), df21.loc[0, 'Q40_Part_1'])
print(colored('9) ', 'blue'), df21.loc[0, 'Q41'])
print(colored('10) ', 'blue'), df21.loc[0, 'Q42_Part_1'])

same_questions_dict['Q12'] = 'Q12'

In [None]:
not_exist_2020 = ['Q27', 'Q28', 'Q36']
not_exist_2021 = ['Q20', 'Q28', 'Q29', 'Q30', 'Q31', 'Q39']

print(colored('1) 2020 --- ', 'blue'), df20.loc[0, 'Q27_A_Part_1'])
print(colored('2) 2020 --- ', 'blue'), df20.loc[0, 'Q28_A_Part_1'])
print(colored('3) 2020 --- ', 'blue'), df20.loc[0, 'Q36_Part_1'])
print()
print(colored('1) 2021 --- ', 'red'), df21.loc[0, 'Q20'])
print(colored('2) 2021 --- ', 'red'), df21.loc[0, 'Q28'])
print(colored('3) 2021 --- ', 'red'), df21.loc[0, 'Q29_A_Part_1'])
print(colored('4) 2021 --- ', 'red'), df21.loc[0, 'Q30_A_Part_1'])
print(colored('5) 2021 --- ', 'red'), df21.loc[0, 'Q31_A_Part_1'])
print(colored('6) 2021 --- ', 'red'), df21.loc[0, 'Q39_Part_1'])

same_questions_dict['Q27'] = 'Q29'
same_questions_dict['Q28'] = 'Q31'
same_questions_dict['Q36'] = 'Q39'

In [None]:
def prepare_data(same_questions_dict, df20, df21, df20_parts, df21_parts):
    cols_20, cols_21 = [], []
    part_cols_20, part_cols_21 = [], []
    for key in same_questions_dict.keys():
        if key in df20_parts.columns:
            part_cols_20.append(key)
            part_cols_21.append(same_questions_dict[key])
        else:
            cols_20.append(key)
            cols_21.append(same_questions_dict[key])
    
    df20['years'] = 2020
    df21['years'] = 2021
    df20_parts['years'] = 2020
    df21_parts['years'] = 2021
    
    cols_20.append('years')
    cols_21.append('years')
    part_cols_20.append('years')
    part_cols_21.append('years')
    
    temp_df21 = df21[cols_21]
    temp_df21.columns = cols_20
    temp_df21_parts = df21_parts[part_cols_21]
    temp_df21_parts.columns = part_cols_20
    
    df_20_21 = pd.concat([df20[cols_20].loc[1:, :], temp_df21.loc[1:, :]], join='outer')
    df_part_20_21 = pd.concat([df20_parts[part_cols_20].loc[1:, :], temp_df21_parts.loc[1:, :]], join='outer')
    
    return df_20_21, df_part_20_21

In [None]:
df_20_21, df_part_20_21 = prepare_data(same_questions_dict, df20, df21, df20_parts, df21_parts)

print(df_20_21.shape)
print(df_part_20_21.shape)

# Data Cleaning

Data cleaning is one of the most importants part of data science. As with most datasets, this dataset needs data cleaning. According to my view, some answers were split like 'Product/Project Manager' to 'Program/Project Manager', 'Product Manager' and some answers have been fixed like PostgresSQL to PostgreSQL in 2021. In the below, we tried to match the same answers.

In [None]:
df_20_21_clean = df_20_21.copy()
df_part_20_21_clean = df_part_20_21.copy()

df_20_21_clean['Q6'] = df_20_21_clean['Q6'].str.replace('1-3 years', '1-2 years')
df_20_21_clean['Q30'] = df_20_21_clean['Q30'].str.replace('PostgresSQL', 'PostgreSQL')
df_20_21_clean['Q4'] = df_20_21_clean['Q4'].str.replace('Professional degree', 'Professional doctorate')
df_20_21_clean['Q5'] = df_20_21_clean['Q5'].str.replace('Program/Project Manager', 'Product/Project Manager')
df_20_21_clean['Q5'] = df_20_21_clean['Q5'].str.replace('Product Manager', 'Product/Project Manager')
df_20_21_clean['Q24'] = df_20_21_clean['Q24'].str.replace('$', '')
df_20_21_clean['Q24'] = df_20_21_clean['Q24'].str.replace('300,000-499,999', '300,000-500,000')
df_20_21_clean['Q24'] = df_20_21_clean['Q24'].str.replace('500,000-999,999', '> 500,000')
df_20_21_clean['Q24'] = df_20_21_clean['Q24'].str.replace('>1,000,000', '> 500,000')
df_20_21_clean['Q11'] = df_20_21_clean['Q11'].str.replace('A personal computer / desktop', 'A personal computer or laptop')
df_20_21_clean['Q11'] = df_20_21_clean['Q11'].str.replace('A laptop', 'A personal computer or laptop')

df_part_20_21_clean['Q10'] = df_part_20_21_clean['Q10'].str.replace('  Amazon Sagemaker Studio Notebooks ', '  Amazon Sagemaker Studio ')
df_part_20_21_clean['Q10'] = df_part_20_21_clean['Q10'].str.replace('\n', '')
df_part_20_21_clean['Q10'] = df_part_20_21_clean['Q10'].str.replace(' Google Cloud Datalab Notebooks', ' Google Cloud Datalab')
df_part_20_21_clean['Q10'] = df_part_20_21_clean['Q10'].str.replace(' Google Cloud AI Platform Notebooks ', ' Google Cloud Notebooks (AI Platform / Vertex AI) ')
df_part_20_21_clean['Q29'] = df_part_20_21_clean['Q29'].str.replace('PostgresSQL', 'PostgreSQL')
df_part_20_21_clean['Q29'] = df_part_20_21_clean['Q29'].str.replace('\n', '')
df_part_20_21_clean['Q29'] = df_part_20_21_clean['Q29'].str.replace(' Microsoft Azure SQL Database ', ' Microsoft Azure Data Lake Storage ')
df_part_20_21_clean['Q29'] = df_part_20_21_clean['Q29'].str.replace(' Microsoft Azure Cosmos DB ', ' Microsoft Azure Data Lake Storage ')
df_part_20_21_clean['Q33'] = df_part_20_21_clean['Q33'].str.replace('\n', '')
df_part_20_21_clean['Q33'] = df_part_20_21_clean['Q33'].str.replace('(', '')
df_part_20_21_clean['Q33'] = df_part_20_21_clean['Q33'].str.replace(')', '')
df_part_20_21_clean['Q33'] = df_part_20_21_clean['Q33'].str.replace(' Automation of full ML pipelines e.g. Google AutoML, H2O Driverless AI', 
                                                        ' Automation of full ML pipelines AutoML')
df_part_20_21_clean['Q33'] = df_part_20_21_clean['Q33'].str.replace(' Automation of full ML pipelines e.g. Google Cloud AutoML, H2O Driverless AI', 
                                                        ' Automation of full ML pipelines Cloud AutoML')
df_part_20_21_clean['Q33'] = df_part_20_21_clean['Q33'].str.replace(' Automation of full ML pipelines e.g. Google AutoML, H20 Driverless AI', 
                                                        ' Automation of full ML pipelines AutoML')
df_part_20_21_clean['Q33'] = df_part_20_21_clean['Q33'].str.replace(' Automation of full ML pipelines e.g. Google Cloud AutoML, H20 Driverless AI', 
                                                        ' Automation of full ML pipelines Cloud AutoML')
df_part_20_21_clean['Q34'] = df_part_20_21_clean['Q34'].str.replace('  H20 Driverless AI  ', '  H2O Driverless AI  ')
df_part_20_21_clean['Q9'] = df_part_20_21_clean['Q9'].str.replace('  Visual Studio / Visual Studio Code ', '  VisualStudio ')
df_part_20_21_clean['Q9'] = df_part_20_21_clean['Q9'].str.replace('(', '')
df_part_20_21_clean['Q9'] = df_part_20_21_clean['Q9'].str.replace(')', '')
df_part_20_21_clean['Q9'] = df_part_20_21_clean['Q9'].str.replace('  Visual Studio Code VSCode ', '  VisualStudio ')
df_part_20_21_clean['Q9'] = df_part_20_21_clean['Q9'].str.replace('  Visual Studio ', '  VisualStudio ')
df_part_20_21_clean['Q9'] = df_part_20_21_clean['Q9'].str.replace('  VisualStudio ', '  Visual Studio / Visual Studio Code ')
df_part_20_21_clean['Q9'] = df_part_20_21_clean['Q9'].str.replace(' Jupyter (JupyterLab, Jupyter Notebooks, etc) ', '  Jupyter Notebook')
df_part_20_21_clean['Q9'] = df_part_20_21_clean['Q9'].str.replace('  Jupyter Notebook', ' Jupyter (JupyterLab, Jupyter Notebooks, etc) ')
df_part_20_21_clean['Q12'] = df_part_20_21_clean['Q12'].str.replace('  Google Cloud TPUs ', ' TPUs')
df_part_20_21_clean['Q12'] = df_part_20_21_clean['Q12'].str.replace('  NVIDIA GPUs ', ' GPUs')
df_part_20_21_clean['Q27'] = df_part_20_21_clean['Q27'].str.replace('  Amazon Elastic Container Service ', '  Amazon Elastic Compute Cloud (EC2) ')
df_part_20_21_clean['Q27'] = df_part_20_21_clean['Q27'].str.replace('  Microsoft Azure Container Instances ', '  Microsoft Azure Virtual Machines ')


#print(df_20_21_clean['Q11'].unique())

# Data Analysis

In this part, we plotted all questions for visual pieces of information. We created 2 functions. 

Function **long_sentences_seperate** is used for visual editing. For instance, if a question or an answer text is so long for plotting, this function splits the text by adding '\n' to the text.

The **barplot_all_cols** function is used for plotting all columns. For color, we selected the 'years' column.

In [None]:
def long_sentences_seperate(sentence, step=10):
    try:
        splittext = sentence.split(" ")
        for x in range(step, len(splittext), step):
            splittext[x] = "\n"+splittext[x].lstrip()
        text = " ".join(splittext)
        return text
    except:
        return sentence

In [None]:
def barplot_all_cols(df, question_mean_dict, df_cols, figsize=(24, 96)):
    response_num_2020 = df20.shape[0]
    response_num_2021 = df21.shape[0]
    
    ncols = 2
    nrows = round(len(df_cols) / ncols)
    fig, axes = plt.subplots(nrows, ncols, figsize=figsize)
    plt.subplots_adjust(hspace=0.3)
    
    index = 0
    for row in range(nrows):
        for col in range(ncols):
            try:
                col_name = df_cols[index]
                question = question_mean_dict[col_name]
                question = long_sentences_seperate(question, step=10)
            except:
                axes[row][col].set_visible(False)
                continue
            
            if col_name == 'Q3':
                selected_countries = df[col_name].value_counts(normalize=True).index[:10]
                temp_df = df[df[col_name].isin(selected_countries)]
                
                temp_df = temp_df.groupby([col_name, 'years']).agg({col_name:'count'})
                temp_df.columns = ['counts']
                temp_df.reset_index(inplace=True)
            else:
                temp_df = df.groupby([col_name, 'years']).agg({col_name:'count'})
                temp_df.columns = ['counts']
                temp_df.reset_index(inplace=True)
            
            temp_df.loc[temp_df['years'] == 2020, 'counts_norm'] = temp_df.loc[temp_df['years'] == 2020, 'counts'] / response_num_2020
            temp_df.loc[temp_df['years'] == 2021, 'counts_norm'] = temp_df.loc[temp_df['years'] == 2021, 'counts'] / response_num_2021
            temp_df[col_name] = temp_df[col_name].apply(lambda x: long_sentences_seperate(x, step=4))
            
            ### Find The Order That Biggest Change to Lowest Change
            count_df = temp_df[col_name].value_counts()
            selected_values = list(count_df[count_df > 1].index)
            clean_temp_df = temp_df[temp_df[col_name].isin(selected_values)]
            changes_list = ((clean_temp_df.loc[clean_temp_df['years'] == 2021, 'counts'].values - clean_temp_df.loc[clean_temp_df['years'] == 2020, 'counts'].values) / 
                            clean_temp_df.loc[clean_temp_df['years'] == 2020, 'counts'].values)
            change_twice_list = []
            for value in changes_list:
                change_twice_list.append(value)
                change_twice_list.append(value)
            clean_temp_df['change'] = change_twice_list
            clean_temp_df.sort_values('change', inplace=True, ascending=False)
            order_list = list(clean_temp_df[col_name].unique())
            temp_df_unique = temp_df[col_name].unique()
            diff_order = list(set(temp_df_unique) - set(order_list))
            if len(diff_order) > 0:
                order_list.extend(diff_order)
            ###
            
            sns.barplot(data=temp_df, x='counts_norm', y=col_name, hue='years', order=order_list, ax=axes[row][col])
            axes[row][col].set_title(question)
            for p in axes[row][col].patches:
                txt = str(p.get_width().round(3))
                txt_x = p.get_width() 
                txt_y = p.get_y() + p.get_height() * 2 / 5
                bar_color = p.get_facecolor()
                try:
                    if bar_color == (0.34705882352941175, 0.4588235294117645, 0.6411764705882353, 1.0):
                        txt_count = str(round(p.get_width() * response_num_2020))
                    elif bar_color == (0.7985294117647057, 0.536764705882353, 0.38970588235294135, 1.0):
                        txt_count = str(round(p.get_width() * response_num_2021))
                except:
                    txt_count = 0
                txt_count_y = p.get_y() + p.get_height() * 4 / 5
                axes[row][col].text(txt_x,txt_y,txt,color=bar_color)
                axes[row][col].text(txt_x,txt_count_y,txt_count,color=bar_color)
            
            index += 1

In [None]:
DS_col = ['Q11', 'Q32', 'Q3', 'Q4', 'Q1', 'Q38', 'Q13', 'Q30', 'Q6', 'Q25', 'Q5', 'Q8']

barplot_all_cols(df_20_21_clean, question_mean_dict, DS_col)

In [None]:
barplot_all_cols(df_part_20_21_clean, question_mean_dict, df_part_20_21_clean.columns, figsize=(24, 192))

# Conclusion
In 2021, all usage of cloud computing increase. The most famous cloud computing tools are AWS, GCP, Microsoft Azure. The cloud computing process will be used more in 2022 ☁️.

Business Intelligence Tools are also increasing. Microsoft Power BI increase 462 to 790 and Tableau increase 540 to 740. We can say Microsoft Power BI will be used much more than Tableau in the future.

In 2020, the most common age was 22–24 in Data Science. Now, it is 18–21. Welcome young Data Scientists 👋. Also 70+ increase from 76 to 128. The range of age of data scientists getting bigger. On the other hand, the answer that is "I have never written code" is decreased even if 18–21 age is increased in 2021. We can say "Code age is decreased" 🔥.
The number of Data Scientists increased around all of the worlds. Most increase in China 🌍.

In the usage of TPU, 2–5 times increase 2012 to 3405, 6–25 increase 424 to 947 and 25+ increase 272 to 612. We can say "We will hear the name of TPU much more in 2022". Usage of GPU decreased from 8309 to 8035 and TPUs increased 960 to 3451. In 2022 TPUs can be more used than GPUs 📱.

Ungraduated and Bachelor's degree increased but Professional doctorate decreased from 699 to 360. That is almost half. This situation can be caused by the Kaggle survey. Maybe data scientists that have Professional doctorates, stoped using Kaggle 📚.

In general, usage of all of the data products increased. Most increase in MySQL. In that article (What Are The Differences Between Data Scientists That Earn 500💲 And 225.000💲 Yearly?), it was also said that databases are so important for data scientists.

In general, all of the jobs increased but Business Analysts and Statisticians can be assumed to be unchanged. Now, we have a new job title that is Developer Relations/Advocacy 💼.

In, Hosted Notebook Products, Binder/JupyterHub decreased from 2072 to 1770 and Kaggle Notebooks increased 5991 to 9506, Colab Notebooks increased 6329 to 9792. The most increase is in Google Cloud Notebooks (AI Platform / Vertex AI) 📓.
All usage of data visualization libraries increased and the most increase is in Seaborn📊 8821 to 12586. In IDEs, all usages of IDEs increased but the most increase is in Visual Studio / Visual Studio Code 2445 to 14150. The second is in Jupyter 11210 to 21720.

In ML Frameworks, Tensorflow, Pytorch, and Xgboost all increased but the most increase in Xgboost 3935 to 5974, CatBoost 957 to 1512, and JAX 84 to 190. Also, we have new selections that are Huggingface and Pytorch Lightning. In Computer Vision, all usage of computer vision algorithms increased but the most increase is in CNN 2003 to 2740 and GAN 1092 to 1492. In Natural Language Processing (NLP), all usage of NLP increased but the most increase is in BERT 1428 to 2351.

Another question is about programing language recommendations. In this question, only Swift decreased. The most increase in SQL. In the use of programming language, Python🐍 is the most famous and the most increase in Javascript 2995 to 4332.
In the question that is about the most important part of work, the most percentage is in analyzing and understanding data to the product or business decisions 6420 to 9107. 35 percent of data scientists gave this answer and the most increase is in the "None of these activities" answer. There can be a new role that is not clear yet in data science.

Automated ML Tools are mostly used in ML pipelines in 2021. Probably it is still will be used ML pipelines in 2022. The most increase is in Databricks AutoML 948 to 1970 and Google Cloud AutoML 2839 to 5567. Also, Google AutoML is the most famous, and Amazon Sagemaker Autopilot, Azure Automate Machine Learning are new in Automated ML Tools ⚙️.

In the question of the course, the most increase is in Kaggle Learn Courses but this data is not trustable because the survey belongs to Kaggle. Other important courses are Certification Programs(AWS, Azure, GCP, etc) increased from 1076 to 1804 and LinkedIn Learning increased from 1617 to 2093. Also, in general, spending money for ml increased 📚.

In the question that is about sharing or deployment, the most famous tool is Github that increased 3434 to 4586. The most increase is in Streamlight 186 to 387, Kaggle 1878 to 3065, and Colab 1247 to 1848.

If you like my work, please, leave an upvote: it will be really appreciated and it will motivate me in offering more content to the Kaggle community ! 😊