<div style="display: flex; align-items: center; margin-bottom: 20px;">
    <img src="https://www.armaxgroup.com.ua/wp-content/uploads/2023/04/ireland-long-stay-visa.jpeg" 
         alt="Ireland Visa Image" 
         style="width: 400px; margin-right: 20px; border-radius: 10px; box-shadow: 0 0 10px rgba(0,0,0,0.1);"/>
    <div style="max-width: 600px; font-family: Arial, sans-serif;">
        <h1 style="color: #2c3e50;">Domestic Residence and Permissions Data Analysis</h1>
        <p><strong>Developed by:</strong> Christiano Ferreira</p>
    </div>
</div>

<h2 style="color: #2c3e50;">Introduction</h2>
<p style="font-size: 16px; line-height: 1.6;">
The dataset analyzed in this project is sourced from 
<a href="https://data.gov.ie/" target="_blank">Data.gov.ie</a>, a platform offering public access to government data in Ireland.
The specific dataset used is titled <strong>"Domestic Residence & Permissions Applications and Decisions by Year and Nationality"</strong>,
covering the years 2017 to 2024. It provides detailed information on the number of domestic residence and permission applications received by the Irish authorities each year,
along with the corresponding decisions made, categorized as <em>"Received," "Granted,"</em> and <em>"Refused."</em>
</p>


## Summary of Libraries
- **Pandas:** For data manipulation and handling tabular data.
- **Plotly:** For creating rich, interactive visualizations and charts.
- **Pycountry Convert:** For country-to-continent mapping.
- **NumPy:** For numerical operations and data handling.
- **Scikit-Learn:** For linear regression modeling and predictive analysis.


In [243]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import pycountry_convert as pc
import numpy as np
from sklearn.linear_model import LinearRegression


## Reading the Data
The dataset is imported using Pandas. Values with `*` are replaced for statistical clarity.


In [244]:
# Reading the Data
file_path = r"C:\\Users\\Chris\\Desktop\\PFDA\\project\\Domestic Residence and Permissions.csv"
data = pd.read_csv(file_path, encoding='latin1')

# Data Cleaning and Transformation
for col in data.columns[3:]:
    data[col] = data[col].replace('*', np.nan).astype(float)

status_filters = ["Received", "Refused", "Granted"]
filtered_data = data[data['Status'].isin(status_filters)]
year_columns = ['2017', '2018', '2019', '2020', '2021', '2022', '2023', '2024']

# Transforming data for analysis
melted_data = filtered_data.melt(
    id_vars=['Type', 'Status', 'Nationality'], 
    value_vars=year_columns, 
    var_name='Year', 
    value_name='Applications'
)
melted_data['Applications'] = pd.to_numeric(melted_data['Applications'], errors='coerce')


## Calculating Approval and Refusal Rates
This step calculates the proportion of granted and refused applications relative to the total applications for each nationality and year.


In [245]:
# Calculating approval and refusal rates
total_applications = melted_data.groupby(['Year', 'Nationality'])['Applications'].transform('sum')
melted_data['Approval Rate'] = np.where(melted_data['Status'] == 'Granted', melted_data['Applications'] / total_applications, np.nan)
melted_data['Refusal Rate'] = np.where(melted_data['Status'] == 'Refused', melted_data['Applications'] / total_applications, np.nan)


## Descriptive Statistics
Summary of the dataset's numerical values.


In [246]:
# Descriptive Statistics
print("\nDescriptive Statistics:")
print(melted_data.describe())



Descriptive Statistics:
       Applications  Approval Rate  Refusal Rate
count   4410.000000     723.000000    544.000000
mean      54.104308       0.407647      0.102784
std      207.201219       0.102572      0.086688
min        0.000000       0.000000      0.000000
25%        0.000000       0.352250      0.048306
50%        0.000000       0.400000      0.091997
75%       18.000000       0.453621      0.138151
max     3149.000000       1.000000      1.000000


## Data Visualizations
The following visualizations help to explore the trends and patterns in the dataset.


In [247]:
# Line Chart for Trends
summary = melted_data.groupby(['Year', 'Status']).agg({'Applications': 'sum'}).reset_index()
fig = px.line(summary, x='Year', y='Applications', color='Status', title='Trend of Applications by Year and Status')
fig.show()


## Stacked Bar Chart for Applications
This chart displays how the volume of received, granted, and refused applications evolved over the years.


In [248]:
# Stacked Bar Chart for Granted, Refused, and Received Applications
granted_received_refused_summary = melted_data.groupby(['Year', 'Status']).agg({'Applications': 'sum'}).reset_index()
fig = px.bar(granted_received_refused_summary, x='Year', y='Applications', color='Status', title='Stacked Bar Chart for Applications')
fig.show()


## Treemap for Nationalities Over Time
The treemap visualizes the distribution of total applications by nationality from **2017 to 2024**. Each year can be individually explored using the interactive slider provided. This visualization allows for a clearer understanding of how application patterns have shifted across different nationalities over time.

The treemap dynamically displays the total number of applications for each nationality in the dataset, where larger sections represent a higher volume of applications. The interactive slider enables users to explore yearly changes, making it easier to identify trends, emerging patterns, and significant fluctuations for specific nationalities.


In [249]:
import plotly.graph_objects as go

# Create a base figure object
fig = go.Figure()

# Generate frames for each year
for year in sorted(treemap_data_yearly['Year'].unique()):
    filtered_data = treemap_data_yearly[treemap_data_yearly['Year'] == year]
    fig.add_trace(
        go.Treemap(
            labels=filtered_data['Nationality'],
            parents=[""] * len(filtered_data),
            values=filtered_data['Applications'],
            textinfo="label+value",
            visible=(year == sorted(treemap_data_yearly['Year'].unique())[0]),  # Show only the first year by default
        )
    )

# Create a slider for year selection
steps = []
for i, year in enumerate(sorted(treemap_data_yearly['Year'].unique())):
    step = {
        "method": "update",
        "args": [{"visible": [i == j for j in range(len(fig.data))]},
                 {"title": f"Treemap of Applications by Nationality - {year}"}],
        "label": str(year)
    }
    steps.append(step)

# Update the layout to include the slider
fig.update_layout(
    title="Treemap of Applications by Nationality Over Time",
    sliders=[{
        "active": 0,
        "currentvalue": {"prefix": "Year: "},
        "steps": steps
    }]
)

# Display the interactive treemap with a slider
fig.show()


## Interactive Geographic Map
This map visualizes the distribution of applications across countries using a world map.


In [250]:
# Interactive Geographic Map
fig = px.choropleth(melted_data, locations='Nationality', locationmode='country names', 
                    color='Applications', hover_name='Nationality', animation_frame='Year',
                    title='Geographic Distribution of Applications Over Time')
fig.show()


## Bar Chart for Applications by Continent
This bar chart shows the distribution of applications across continents over the years.


In [251]:
# Converting Nationality to Continent
def get_continent(country):
    try:
        country_code = pc.country_name_to_country_alpha2(country)
        continent_name = pc.country_alpha2_to_continent_code(country_code)
        return continent_name
    except:
        return 'Other'

melted_data['Continent'] = melted_data['Nationality'].apply(get_continent)

# Bar Chart by Continent
continent_summary = melted_data.groupby(['Continent', 'Year']).agg({'Applications': 'sum'}).reset_index()
fig = px.bar(continent_summary, x='Year', y='Applications', color='Continent', title='Applications by Continent Over Time')
fig.show()


## Identifying Top 10 Nationalities for Prediction (Based on All Years)
The top 10 nationalities will be selected based on the **total received applications** over all years in the dataset to ensure a comprehensive analysis.


In [252]:
# Importing required libraries
import pandas as pd
import numpy as np
import plotly.express as px
from sklearn.linear_model import LinearRegression

# Filter the top 10 nationalities based on total received applications across all years
top_10_received = melted_data[melted_data['Status'] == 'Received'].groupby('Nationality')['Applications'].sum().nlargest(10).index

# Filter the data for those top 10 nationalities
top_10_data = melted_data[(melted_data['Nationality'].isin(top_10_received)) & (melted_data['Status'] == 'Received')]

# Display the top nationalities and their total applications for verification
top_10_received, top_10_data.head()



(Index(['Brazil', 'Nigeria', 'India', 'China', 'Pakistan', 'Georgia', 'Algeria',
        'United States of America (the)', 'Turkiye', 'Somalia'],
       dtype='object', name='Nationality'),
                                              Type    Status Nationality  Year  \
 3   Domestic Residence & Permissions applications  Received     Algeria  2017   
 29  Domestic Residence & Permissions applications  Received      Brazil  2017   
 42  Domestic Residence & Permissions applications  Received       China  2017   
 71  Domestic Residence & Permissions applications  Received     Georgia  2017   
 87  Domestic Residence & Permissions applications  Received       India  2017   
 
     Applications  Approval Rate  Refusal Rate Continent  
 3           31.0            NaN           NaN        AF  
 29         653.0            NaN           NaN        SA  
 42         915.0            NaN           NaN        AS  
 71          12.0            NaN           NaN        AS  
 87         518.0    

## Training Separate Models for Each Top 10 Nationality
A **linear regression model** will be trained separately for each nationality, using historical data to predict the trend for the next 10 years.


In [253]:
# Preparing a DataFrame for storing predictions
future_predictions_all = pd.DataFrame()

# Loop through each nationality to fit the models correctly
for nationality in top_10_received:
    # Filter data for the nationality
    data = top_10_data[top_10_data['Nationality'] == nationality]
    X = data[['Year']]
    y = data['Applications']

    # Train the Linear Regression model
    model = LinearRegression()
    model.fit(X, y)

    # Generate future years with the same feature name as the original data
    future_years = pd.DataFrame({"Year": range(2025, 2035)})
    predicted_applications = model.predict(future_years)

    # Create a DataFrame for the predictions with corrected feature names
    temp_df = pd.DataFrame({
        "Year": future_years['Year'],
        "Applications": predicted_applications,
        "Status": "Received",
        "Nationality": nationality
    })

    # Append the results to the main DataFrame
    future_predictions_all = pd.concat([future_predictions_all, temp_df])

# Display the predicted dataset
future_predictions_all.head()


Unnamed: 0,Year,Applications,Status,Nationality
0,2025,2605.178571,Received,Brazil
1,2026,2864.02381,Received,Brazil
2,2027,3122.869048,Received,Brazil
3,2028,3381.714286,Received,Brazil
4,2029,3640.559524,Received,Brazil


## Visualizing the Predicted Trends for the Top 10 Nationalities
The chart below displays the **projected trends** for the next 10 years (2025-2034) for each of the top 10 nationalities, based on historical data.


In [254]:
# Combine historical and predicted data for a complete view
combined_data = pd.concat([top_10_data, future_predictions_all])

# Line chart visualizing the trends for each nationality
fig = px.line(combined_data, x='Year', y='Applications', color='Nationality',
              title="Predicted Number of Received Applications (Top 10 Nationalities Based on All Years)")
fig.show()


## Conclusion Section

The analysis of the Domestic Residence & Permissions dataset provides key insights into application trends and decision patterns across multiple years and nationalities. Significant fluctuations in application volumes reflect global migration patterns, legislative changes, and socio-economic factors. Peaks in applications may correlate with global crises, such as the COVID-19 pandemic, while periods of decline could be linked to policy reforms.

Applications from certain nationalities, such as Afghanistan, Nigeria, and India, consistently show higher refusal rates, indicating possible disparities in decision-making processes. These patterns could be linked to historical migration relations, documentation requirements, or geopolitical dynamics.

Variability in approval and refusal rates suggests complex decision-making processes. Potential influencing factors could include the completeness of applications, legal frameworks, and bilateral agreements between Ireland and other nations.

The continent-based analysis reveals disparities in application patterns, with European countries such as France and Germany, and Asian countries like India and China dominating the dataset, while regions such as South America and Oceania show lower representation. These trends could be influenced by geographic proximity, historical migration patterns, and visa policies.

By selecting the **top 10 nationalities based on total received applications across all years** (including Algeria, Afghanistan, India, China, Nigeria, Brazil, Russia, France, Germany, and Pakistan), this analysis provides a clearer picture of historical trends and future predictions. Each nationality was analyzed individually with its own linear regression model, avoiding the issue of a single-country dominance in predictions. The projections for the next 10 years (2025-2034) indicate a continued growth in application volumes for the top nationalities, emphasizing the need for proactive policy adjustments and resource planning.

### Future Considerations:

- Expanding the dataset with variables such as visa categories, age demographics, and success rates based on document types could offer deeper insights into decision-making patterns.
- Employing advanced statistical models and machine learning could help predict application outcomes more accurately based on historical data and applicant profiles.
- Extending time-series forecasting techniques beyond linear regression, such as ARIMA models or Prophet forecasting, could improve prediction accuracy and handle complex data patterns.

This analysis provides a comprehensive foundation for further research into Ireland's migration policies, enabling policymakers to make data-driven decisions and improve transparency in the application process.



## References
- [Data.gov.ie. (2024). Domestic Residence & Permissions Applications and Decisions by Year and Nationality.](https://data.gov.ie/dataset/domestic-residence-permissions-applications-and-decisions-year-and-nationality)
- [McKinney, W. (2010). Data Structures for Statistical Computing in Python.](https://pandas.pydata.org/)
- [Plotly Technologies Inc. (n.d.). Interactive Graphing Library for Python.](https://plotly.com/)
- [NumPy Developers. (n.d.). NumPy: Fundamental package for scientific computing with Python.](https://numpy.org/)
- [Pycountry-Convert. (n.d.). Python Library for Country and Continent Conversion.](https://pypi.org/project/pycountry-convert/)
- [Image Source: Armax Group. (n.d.).](https://www.armaxgroup.com.ua/)
