# 💞💍 **Marriage Trends in India: Love vs. Arranged** 💑✨  

---

## 📌 **Overview of the Dataset & Analysis**  

---

## 📝 **Introduction**  
Marriage is one of the most significant social institutions in **India**, deeply rooted in culture, religion, and economic factors. However, marital success varies based on numerous **sociological and economic aspects** such as **education, caste, parental approval, and financial stability**.  

This dataset provides valuable insights into **marriage patterns, divorce rates, satisfaction levels, and key socioeconomic factors** influencing marital success in India. Through **detailed analysis**, we aim to uncover patterns that shape Indian marriages.  

---

## 📊 **About the Dataset**  
📌 This dataset comprises **10,000 records** and **18 key attributes**, covering various aspects of marriage, including:  

✔ **Marriage Type:** Love vs. Arranged 💑  
✔ **Age at Marriage:** The age at which individuals got married 📅  
✔ **Education Level:** School, Graduate, Postgraduate, etc. 🎓  
✔ **Caste & Religion:** Whether the marriage was **within the same caste or inter-caste** ⛪🕌  
✔ **Parental Approval:** Full, Partial, or No approval 👨‍👩‍👧‍👦  
✔ **Dowry Exchange:** Whether dowry was involved in the marriage 💰  
✔ **Marital Satisfaction:** Low, Medium, or High 😊😐😢  
✔ **Divorce Status:** Whether the marriage ended in divorce ⚖️  
✔ **Income Level:** Low, Middle, or High-income families 💵  
✔ **Children Count:** Number of children in the marriage 👶  

This data provides a **holistic view** of **modern Indian marriages**, including their challenges and success factors.  

---

## 🔍 **Objective of the Analysis**  
This analysis seeks to answer **key societal and behavioral questions**:  

📌 **Which type of marriage (Love vs. Arranged) is more common and more successful?** ❤️💍  
📌 **Does parental approval influence divorce rates?** 👨‍👩‍👦✅❌  
📌 **How do income level and education impact marital satisfaction?** 🎓💰  
📌 **Are inter-caste and inter-religion marriages more likely to face challenges?** 🏡🔄  
📌 **What factors contribute to a successful marriage in India?** 🎯  

By uncovering these insights, we can understand **how societal factors impact marital stability**.  

---

## 🛠️ **Key Analysis & Techniques Used**  
To extract meaningful insights, we employ **powerful data analysis techniques**, including:  

📊 **Exploratory Data Analysis (EDA):** Understanding **trends, distributions, and key statistics** 🔍  
📌 **Outlier Detection:** Identifying unusual patterns using **Boxplots** 📉  
⚡ **Feature Engineering:** Creating a **"Marriage Success Score"** for deeper insights 🎯  
📊 **Chi-Square Test:** Checking statistical relationships, e.g., **Marriage Type vs. Divorce Rate** 📏  
📈 **Visualizations:** Using **Bar Charts, KDE Plots, and Heatmaps** to identify trends 🎨  

These techniques help transform **raw data into actionable insights** that provide a **comprehensive understanding of Indian marriages**.  

---

## 🎯 **Expected Outcomes**  
By the end of this analysis, learners will:  

✅ **Understand key trends in Indian marriages** 📊  
✅ **Learn how cultural, financial, and educational factors impact marital success** 🏡📚  
✅ **Gain hands-on experience in EDA, statistical testing, and data visualization** 💡📉  

This project is ideal for **data analysts, sociologists, policymakers, and anyone interested in understanding marriage trends in India**. 🚀  

🔹 **Ready to explore the hidden patterns in Indian marriages? Let's dive into the analysis!** 🔍 

## 📌 Table of Contents

✅ **Import Libraries**
📌 **Load & Inspect Data**
📌 **Check for Duplicates & Missing Values**
📌 **Summary Statistics**
📌 **Outlier Detection & Treatment**
📌 **Feature Engineering (Marriage Success Score)**
📌 **Chi-Square Test (Marriage Type vs Divorce)**
📌 **Visualizations (Marriage Trends, Divorce, Satisfaction)**
📌 **KDE Plot (Age at Marriage & Marriage Type)**
📌 **Correlation Heatmap**



# 📌 1️⃣ Import Libraries <a id='section-1'></a>

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import chi2_contingency, ttest_ind

# Set plot style
sns.set_style("whitegrid")

# Display settings
pd.set_option('display.max_columns', None)  # Show all columns


# 📌 2️⃣ Load & Inspect Data 

In [None]:
# Load the dataset
file_path = "/kaggle/input/marriage-trends-in-india-love-vs-arranged/marriage_data_india.csv"
df = pd.read_csv(file_path)

# Display basic information
print("\n🔹 Dataset Overview:")
df.info()

# Display first 5 records
print("\n🔹 First 5 Rows of Dataset:")
print(df.head())


# 📌 3️⃣ Check for Duplicates & Missing Values 

In [None]:
# Check for duplicates
duplicate_count = df.duplicated().sum()
print(f"\n🔹 Number of Duplicate Records: {duplicate_count}")

# Check for missing values
missing_values = df.isnull().sum()
print("\n🔹 Missing Values Count:")
print(missing_values[missing_values > 0])  # Show only columns with missing values


# 📌 4️⃣ Summary Statistics

In [None]:
# Summary statistics for numerical variables
print("\n🔹 Summary Statistics (Numerical Variables):")
print(df.describe())

# Summary statistics for categorical variables
print("\n🔹 Summary Statistics (Categorical Variables):")
print(df.describe(include="object").transpose())


# 📌 5️⃣ Outlier Detection & Treatment 

In [None]:
# Boxplot for numerical variables to detect outliers
plt.figure(figsize=(12, 5))
sns.boxplot(data=df[['Age_at_Marriage', 'Children_Count', 'Years_Since_Marriage']], palette="pastel")
plt.title("Outlier Detection using Boxplot")
plt.show()


# 📌 6️⃣ Feature Engineering (Marriage Success Score)

In [None]:
# Define a function to assign scores based on Divorce & Satisfaction
def marriage_success(row):
    if row['Divorce_Status'] == "No" and row['Marital_Satisfaction'] == "High":
        return 3
    elif row['Divorce_Status'] == "No" and row['Marital_Satisfaction'] == "Medium":
        return 2
    elif row['Divorce_Status'] == "No":
        return 1
    else:
        return 0

# Apply function to create new feature
df["Marriage_Success_Score"] = df.apply(marriage_success, axis=1)

# Display first few rows with new feature
df[['Marriage_Type', 'Divorce_Status', 'Marital_Satisfaction', 'Marriage_Success_Score']].head()


# 📌 7️⃣ Chi-Square Test (Marriage Type vs Divorce)

In [None]:
# Create contingency table
contingency_table = pd.crosstab(df['Marriage_Type'], df['Divorce_Status'])

# Perform Chi-Square Test
chi2, p, _, _ = chi2_contingency(contingency_table)

print(f"\n🔹 Chi-Square Test for Marriage Type & Divorce:")
print(f"Chi-Square Statistic: {chi2:.2f}, P-value: {p:.5f}")

# Interpretation
if p < 0.05:
    print("🔹 Significant Relationship Found (Marriage Type affects Divorce).")
else:
    print("🔹 No Significant Relationship Found.")


# 📌 8️⃣ Visualizations (Marriage Trends, Divorce, Satisfaction)

In [None]:
# Define figure size for multiple plots
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Marriage Type Distribution
sns.countplot(data=df, x="Marriage_Type", palette="coolwarm", ax=axes[0, 0])
axes[0, 0].set_title("Marriage Type Distribution")

# Divorce Status Distribution
sns.countplot(data=df, x="Divorce_Status", palette="viridis", ax=axes[0, 1])
axes[0, 1].set_title("Divorce Status Distribution")

# Marital Satisfaction Levels
sns.countplot(data=df, x="Marital_Satisfaction", palette="pastel", ax=axes[1, 0])
axes[1, 0].set_title("Marital Satisfaction Levels")

# Parental Approval Distribution
sns.countplot(data=df, x="Parental_Approval", palette="Set2", ax=axes[1, 1])
axes[1, 1].set_title("Parental Approval Distribution")

# Adjust layout for better visibility
plt.tight_layout()
plt.show()


# 📌 9️⃣ KDE Plot (Age at Marriage & Marriage Type)

In [None]:
plt.figure(figsize=(10, 5))
sns.kdeplot(df[df['Marriage_Type'] == 'Love']['Age_at_Marriage'], label="Love Marriage", shade=True)
sns.kdeplot(df[df['Marriage_Type'] == 'Arranged']['Age_at_Marriage'], label="Arranged Marriage", shade=True)
plt.title("Age at Marriage Distribution by Marriage Type")
plt.xlabel("Age at Marriage")
plt.ylabel("Density")
plt.legend()
plt.show()


# 📌 🔟 Correlation Heatmap 

In [None]:
# Select only numerical columns
numeric_df = df.select_dtypes(include=[np.number])

# Compute correlation matrix
correlation_matrix = numeric_df.corr()

# Plot heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)

# Labels and title
plt.title("Correlation Matrix of Numerical Features")

# Show plot
plt.show()

# 🚀 Final Thoughts
**This structured EDA notebook allows step-by-step execution with modular analysis, feature engineering, statistical tests, and visualizations. It ensures:
✅ Easy readability
✅ Professional & structured approach
✅ Data-driven insights with statistical validation**