# Conversation Transcript - 2024-11-05

### Step-by-Step Guide: Web Scraping & Visualization for Economic Impact Analysis of Immigration


#### Step 1: Define the Project Structure and Key Questions
The goal is to collect, analyze, and visualize data on the economic impact of undocumented immigrants in the US.
    - **Key Questions**: Population size, criminality rates, country origins, economic contributions.
    - **Tools**: `requests`, `BeautifulSoup`, `pandas`, `matplotlib`, `seaborn`, and `streamlit`.
    


### Step 3: Web Scraping Code
Code to scrape immigration-related data from a specified source (e.g., USAFacts):
    

In [1]:

import requests
from bs4 import BeautifulSoup
import pandas as pd

# URL of the data source
url = "https://usafacts.org/articles/what-can-the-data-tell-us-about-unauthorized-immigration/"

# Get the page content
response = requests.get(url)
if response.status_code == 200:
    soup = BeautifulSoup(response.content, 'html.parser')

    # Extract relevant data
    data_sections = soup.find_all('section')
    immigration_data = []
    for section in data_sections:
        header = section.find('h2').text if section.find('h2') else None
        paragraph = section.find('p').text if section.find('p') else None
        immigration_data.append({'Section': header, 'Content': paragraph})
    
    # Convert to DataFrame for further processing
    df = pd.DataFrame(immigration_data)
else:
    print("Failed to retrieve data.")
    


### Step 4: Data Cleaning & Adding Sample Data
Ensuring the DataFrame has columns for 'Year', 'Population', 'Country', and 'Criminality Rate' for plotting.
    

In [2]:

# Add sample data for demonstration if not included in the scraped data
if 'Year' not in df.columns:
    df['Year'] = [2000, 2005, 2010, 2015, 2020]
if 'Population' not in df.columns:
    df['Population'] = [8.2, 9.1, 10.3, 11.5, 12.7]
if 'Country' not in df.columns:
    df['Country'] = ['Mexico', 'India', 'China', 'El Salvador', 'Guatemala']
if 'Criminality Rate' not in df.columns:
    df['Criminality Rate'] = [5, 3, 2, 6, 4]
    

ValueError: Length of values (5) does not match length of index (19)


### Step 5: Data Visualization
Using `matplotlib` and `seaborn` to visualize data on population over time and criminality rates by country.
    

In [3]:

import matplotlib.pyplot as plt
import seaborn as sns

def plot_population_over_time(df):
    plt.figure(figsize=(10, 6))
    sns.lineplot(data=df, x='Year', y='Population', marker='o')
    plt.title('Estimated Immigrant Population Over Time')
    plt.xlabel('Year')
    plt.ylabel('Population Estimate')
    plt.show()

def plot_criminality_by_country(df):
    plt.figure(figsize=(12, 6))
    sns.barplot(data=df, x='Country', y='Criminality Rate')
    plt.title('Criminality Rates by Country of Origin')
    plt.xlabel('Country')
    plt.ylabel('Criminality Rate (%)')
    plt.xticks(rotation=45)
    plt.show()

# Run the visualizations
plot_population_over_time(df)
plot_criminality_by_country(df)
    

ValueError: Could not interpret value `Year` for `x`. An entry with this name does not appear in `data`.

<Figure size 1000x600 with 0 Axes>


### Step 6: Streamlit Interactive Dashboard
Creating an interactive dashboard to view data and visualize trends using Streamlit.
    

In [4]:

import streamlit as st

st.title("Economic Impact of Illegal & Undocumented Immigrants in the US")
st.write("A dashboard to explore immigration data and its economic impact.")

# Display the data in Streamlit
st.dataframe(df)

# Interactive line chart in Streamlit
if 'Population' in df.columns:
    st.line_chart(df.set_index('Year')['Population'])
    

2024-11-04 22:20:02.684 
  command:

    streamlit run c:\ProgramData\Anaconda3\envs\ImmigQuant\lib\site-packages\ipykernel_launcher.py [ARGUMENTS]



### Usage Instructions
- **Standalone Python Script**: Run `python script_name.py` in a terminal to display `matplotlib` visualizations.
- **Streamlit Dashboard**: Run `streamlit run script_name.py` in a terminal to open an interactive dashboard in a web browser.
    