<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo"  />
    </a>
</p>


# **Histogram**


Estimated time needed: **45** minutes


In this lab, you will focus on the visualization of data. The dataset will be provided through an RDBMS, and you will need to use SQL queries to extract the required data.


## Objectives


In this lab, you will perform the following:


- Visualize the distribution of data using histograms.

- Visualize relationships between features.

- Explore data composition and comparisons.


## Demo: Working with database


#### Download the database file.


In [None]:
!wget -O survey-data.sqlite https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/QR9YeprUYhOoLafzlLspAw/survey-results-public.sqlite

#### Install the required libraries and import them


In [None]:
!pip install pandas

In [None]:
!pip install matplotlib

In [None]:
import sqlite3
import pandas as pd
import matplotlib.pyplot as plt

#### Connect to the SQLite database


In [None]:
conn = sqlite3.connect('survey-data.sqlite')

## Demo: Basic SQL queries

**Demo 1: Count the number of rows in the table**


In [None]:
QUERY = "SELECT COUNT(*) FROM main"
df = pd.read_sql_query(QUERY, conn)
print(df)


**Demo 2: List all tables**


In [None]:
QUERY = """
SELECT name as Table_Name 
FROM sqlite_master 
WHERE type = 'table'
"""
pd.read_sql_query(QUERY, conn)


**Demo 3: Group data by age**


In [None]:
QUERY = """
SELECT Age, COUNT(*) as count 
FROM main 
GROUP BY Age 
ORDER BY Age
"""
df_age = pd.read_sql_query(QUERY, conn)
print(df_age)


In [None]:
def create_histogram_suite(df):
    """
    Creates comprehensive suite of histograms for data analysis
    """
    # Create figure layout
    fig = plt.figure(figsize=(20, 25))
    gs = fig.add_gridspec(4, 2)
    
    # 1.1 CompTotal Histogram
    ax1 = fig.add_subplot(gs[0, 0])
    sns.histplot(data=df, x='CompTotal', bins=50, ax=ax1)
    ax1.set_title('Distribution of Total Compensation')
    
    # 1.2 YearsCodePro Histogram
    ax2 = fig.add_subplot(gs[0, 1])
    sns.histplot(data=df, x='YearsCodePro', bins=30, ax=ax2)
    ax2.set_title('Years of Professional Coding Experience')
    
    # 2.1 CompTotal by Age Group
    ax3 = fig.add_subplot(gs[1, 0])
    sns.histplot(data=df, x='CompTotal', hue='Age', multiple="stack", ax=ax3)
    ax3.set_title('Compensation Distribution by Age Group')
    
    # 2.2 TimeSearching by Age Group
    ax4 = fig.add_subplot(gs[1, 1])
    sns.histplot(data=df, x='TimeSearching_Num', hue='Age', multiple="stack", ax=ax4)
    ax4.set_title('Time Searching Distribution by Age')
    
    # 3.1 Desired Databases
    ax5 = fig.add_subplot(gs[2, 0])
    db_counts = df['DatabaseWantToWorkWith'].str.split(';').explode().value_counts()
    sns.barplot(x=db_counts.head(10).values, y=db_counts.head(10).index, ax=ax5)
    ax5.set_title('Top 10 Desired Databases')
    
    # 3.2 Remote Work Preferences
    ax6 = fig.add_subplot(gs[2, 1])
    sns.histplot(data=df, x='RemoteWork', ax=ax6)
    ax6.set_title('Remote Work Preferences')
    
    # 4.1 CompTotal for Ages 45-60
    ax7 = fig.add_subplot(gs[3, 0])
    age_mask = (df['Age_Numeric'] >= 45) & (df['Age_Numeric'] <= 60)
    sns.histplot(data=df[age_mask], x='CompTotal', ax=ax7)
    ax7.set_title('Compensation Distribution (Ages 45-60)')
    
    # 4.2 Job Satisfaction by Experience
    ax8 = fig.add_subplot(gs[3, 1])
    sns.histplot(data=df, x='JobSat', hue='YearsCodePro', multiple="stack", ax=ax8)
    ax8.set_title('Job Satisfaction by Years of Experience')
    
    plt.tight_layout()
    return fig
# Execute visualization
histogram_analysis = create_histogram_suite(df)


## Hands-on Lab: Visualizing Data with Histograms


### 1. Visualizing the distribution of data (Histograms)


**1.1 Histogram of `CompTotal` (Total Compensation)**


Objective: Plot a histogram of `CompTotal` to visualize the distribution of respondents' total compensation.


In [None]:
## Write your code here

**1.2 Histogram of YearsCodePro (Years of Professional Coding Experience)**


Objective: Plot a histogram of `YearsCodePro` to analyze the distribution of coding experience among respondents.


In [None]:
## Write your code here

### 2. Visualizing Relationships in Data


**2.1 Histogram Comparison of `CompTotal` by `Age` Group**


Objective: Use histograms to compare the distribution of CompTotal across different Age groups.


In [None]:
## Write your code here

**2.2 Histogram of TimeSearching for Different Age Groups**


Objective: Use histograms to explore the distribution of `TimeSearching` (time spent searching for information) for respondents across different age groups.


In [None]:
## Write your code here

### 3. Visualizing the Composition of Data


**3.1 Histogram of Most Desired Databases (`DatabaseWantToWorkWith`)**


Objective: Visualize the most desired databases for future learning using a histogram of the top 5 databases.


In [None]:
## Write your code here

**3.2 Histogram of Preferred Work Locations (`RemoteWork`)**


Objective: Use a histogram to explore the distribution of preferred work arrangements (`remote work`).


In [None]:
## Write your code here

### 4. Visualizing Comparison of Data


**4.1 Histogram of Median CompTotal for Ages 45 to 60**


Objective: Plot the histogram for `CompTotal` within the age group 45 to 60 to analyze compensation distribution among mid-career respondents.


In [None]:
## Write your code here

**4.2 Histogram of Job Satisfaction (`JobSat`) by YearsCodePro**


Objective: Plot the histogram for `JobSat` scores based on respondents' years of professional coding experience.


In [None]:
## Write your code here

### Final step: Close the database connection


Once you've completed the lab, make sure to close the connection to the SQLite database:



In [None]:
conn.close()

### Summary


In this lab, you used histograms to visualize various aspects of the dataset, focusing on:

- Distribution of compensation, coding experience, and work hours.

- Relationships in compensation across age groups and work status.

- Composition of data by desired databases and work environments.

- Comparisons of job satisfaction across years of experience.

Histograms helped reveal patterns and distributions in the data, enhancing your understanding of developer demographics and preferences.


## Authors:
Ayushi Jain


### Other Contributors:
- Rav Ahuja
- Lakshmi Holla
- Malika


Copyright © IBM Corporation. All rights reserved.
