<a href="https://colab.research.google.com/github/NatashaKamami/Python-Data-Analysis/blob/main/Credit_Spending_India.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Importing the necessary libraries and loading the dataset**

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px


In [None]:
card_data = pd.read_csv('/content/Credit card transactions - India - Simple.csv')

In [None]:
card_data.head()

Unnamed: 0,index,City,Date,Card Type,Exp Type,Gender,Amount
0,0,"Delhi, India",29-Oct-14,Gold,Bills,F,82475
1,1,"Greater Mumbai, India",22-Aug-14,Platinum,Bills,F,32555
2,2,"Bengaluru, India",27-Aug-14,Silver,Bills,F,101738
3,3,"Greater Mumbai, India",12-Apr-14,Signature,Bills,F,123424
4,4,"Bengaluru, India",5-May-15,Gold,Bills,F,171574


##**Data Cleaning**

In [None]:
card_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26052 entries, 0 to 26051
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   index      26052 non-null  int64 
 1   City       26052 non-null  object
 2   Date       26052 non-null  object
 3   Card Type  26052 non-null  object
 4   Exp Type   26052 non-null  object
 5   Gender     26052 non-null  object
 6   Amount     26052 non-null  int64 
dtypes: int64(2), object(5)
memory usage: 1.4+ MB


In [None]:
card_data.isnull().sum()

Unnamed: 0,0
index,0
City,0
Date,0
Card Type,0
Exp Type,0
Gender,0
Amount,0


In [None]:
card_data['Date'] = pd.to_datetime(card_data['Date'], errors='coerce')

  card_data['Date'] = pd.to_datetime(card_data['Date'], errors='coerce')


In [None]:
card_data[['City', 'Country']] = card_data['City'].str.split(',', expand=True)

In [None]:
card_data.drop(columns=['index'], inplace=True)

In [None]:
card_data.drop(columns=['Country'], inplace=True)

In [None]:
card_data.columns

Index(['City', 'Date', 'Card Type', 'Exp Type', 'Gender', 'Amount'], dtype='object')

In [None]:
card_data.nunique()

Unnamed: 0,0
City,986
Date,600
Card Type,4
Exp Type,6
Gender,2
Amount,24972


In [None]:
# changing F to (Female) amd M to (Male)
card_data['Gender'] = card_data['Gender'].replace('F', 'Female')
card_data['Gender'] = card_data['Gender'].replace('M', 'Male')
card_data

Unnamed: 0,City,Date,Card Type,Exp Type,Gender,Amount
0,Delhi,2014-10-29,Gold,Bills,Female,82475
1,Greater Mumbai,2014-08-22,Platinum,Bills,Female,32555
2,Bengaluru,2014-08-27,Silver,Bills,Female,101738
3,Greater Mumbai,2014-04-12,Signature,Bills,Female,123424
4,Bengaluru,2015-05-05,Gold,Bills,Female,171574
...,...,...,...,...,...,...
26047,Kolkata,2014-06-22,Silver,Travel,Female,128191
26048,Pune,2014-08-03,Signature,Travel,Male,246316
26049,Hyderabad,2015-01-16,Silver,Travel,Male,265019
26050,Kanpur,2014-09-14,Silver,Travel,Male,88174


#**Data Analysis: Credit Card Spending Habits in India**
##**1. What is the most preferred card type associated with spending in India?**

In [None]:
# Card type preference
fig = px.pie(card_data, names="Card Type", values="Amount", color="Card Type", title="Card Type Preference")
fig.show()

The pie chart represents the distribution of card types in India. It shows that the Silver card type is the most popular and widely preferred for transactions, while the Gold card type is the least preferred. However, since the segments in the pie chart are relatively similar in size, it suggests that card type preferences are evenly distributed in India.
The popularity of Silver cards in India can be attributed to their affordability, with lower fees and fewer eligibility requirements in comparison to Gold cards, making them accessible in a price-sensitive market like India. Designed for entry-level users and those with moderate incomes, Silver cards cater to a significant portion of India's population. Additionally, Indian consumers' preference for cost-effective and straightforward financial products aligns with the features of Silver cards, while the stricter requirements and higher costs of Gold cards limit their appeal.

##**2. Gender spending patterns and their card type preference**

In [None]:
# Spending patterns by gender and card type
total_amount = card_data.groupby(['Card Type', 'Gender'])['Amount'].sum().reset_index()
fig = px.histogram(total_amount, x="Card Type", y="Amount", color="Gender", barmode="group",
                   title="Spending by Gender and Card Type",
                   histfunc="sum")
fig.show()

The histogram visualizes spending patterns by gender and card type. For each card type, Females tend to exhibit higher spending levels in comparison to Males. Notably, the Silver card type has a significantly higher number of Female users, which aligns with evidence that the Silver card is the most preferred card type in India.
Women in India tend to spend more than men due to a combination of social roles, economic independence, and cultural factors. As primary household purchasers, women manage family budgets and purchase essentials like groceries and household goods. Their growing economic independence and participation in the workforce also provides them with greater disposable income and spending autonomy. Additionally, cultural factors also play a role in spending atterns since women are often responsible for planning and spending on festivals, weddings, and other significant events.

##**3. Gender spending patterns and their card type preference in the Top 10 Cities**

In [None]:
# Total spending per city grouped by card type and gender
top_cities = card_data.groupby('City', as_index=False)['Amount'].sum().nlargest(10, 'Amount')

# Filter the original dataset for the top 10 cities
top_cities_data = card_data[card_data['City'].isin(top_cities['City'])]

fig = px.histogram(top_cities_data, x="City", y="Amount", color="Gender", facet_col="Card Type",
                   title="Spending Distribution by Gender and Card Type in Top Cities",barmode="stack")

# Customize layout
fig.update_layout(xaxis=dict(title="City", tickangle=45),yaxis_title="Total Amount Spent",
                  legend_title="Gender", template="plotly_white")
fig.show()



The histogram displays total spending per city, grouped by card type and gender, across the top 10 cities. This spending is concentrated in Delhi, Bengaluru, Greater Mumbai, and Ahmedabad which lead in total spending. These cities are economic and financial hubs with high populations, greater disposable incomes, and better access to digital payment infrastructure and technologies, which contribute to higher spending levels. Affluent residents, more businesses, and increased credit access also drive spending in this areas.
In terms of card usage, Greater Mumbai leads in usage of Gold cards for transactions (due to its wealthy population), while Bengaluru leads in the usage of Silver and Signature cards, and Ahmedabad shows a preference for Platinum cards. Overal gender-based spending patterns show that women are the dominant spenders in most cities, except Kolkata, where men lead in spending across all card types. This anomaly in Kolkata warrants further exploration to understand the factors driving this divergence.

##**4. Total spending by expense type in the top 10 cities**

In [None]:
# Total spending by expense type
fig = px.treemap(top_cities_data, path=["City", "Exp Type"], values="Amount", title="Expenditure Type Trends by City")
fig.show()

The treemap illustrates total spending by expense type across the top 10 cities. In most cities, Bills dominate as the primary expense type, followed closely by Grocery, which is also a leading category in several cities. This indicates that these two expense types are both popular and essential in most cities. However, Kolkata stands out with Entertainment as its most dominant expense type, deviating from the broader trend. This suggests location-specific spending habits that differ significantly from the other cities. The larger rectangles for essential categories, such as Bills and Grocery, highlight a focus on necessities, while larger areas for discretionary categories, such as Entertainment or Travel, could reflect higher disposable income or distinct lifestyle trends in those cities.  

Given Kolkata's divergence both in gender spending patterns and its preference for Entertainment as the leading expense type, further investigation into spending habits in this city could provide valuable insights into its unique behavioral trends.

##**5. Expense-Specific Spending by Card Type**

In [None]:
# Spending patterns by expense type and card type in the top 10 cities
total_amount = top_cities_data.groupby(['Card Type', 'Exp Type'])['Amount'].sum().reset_index()
fig = px.histogram(total_amount, x="Exp Type", y="Amount", color="Card Type", barmode="group",
                   title="Spending by Expense Type and Card Type",
                   histfunc="sum")
fig.show()

The histogram compares spending by expense type and card type in the top 10 cities. Silver cards tend to dominate every expense type except Travel where the Gold card dominates. Silver Cards' dominance in most expense types is because they tend to cater to a broader audience and are more commonly used for routine expenses like groceries, food and bills. Gold Cards on the other hand are dominant in Travel spending due to the fact that they often come with perks tailored to travelers, such as: airline miles, lounge access, travel insurance and higher reward points for travel-related purchases like flights, hotels, fuel. These incentives likely make them the preferred choice for travel expenses. Additionally, gold cardholders also represent a smaller but more affluent segment that travels frequently and leverages these benefits for high-value purchases.  


#**6. Why does Kolkata follow a contrary trend when it comes to total spending on expense types and gender spending patterns?**

##**6.1 Total Spending by expense type**

In [None]:
# Filter the data for the city Kolkata
kolkata_data = card_data[card_data['City'] == 'Kolkata']

# Calculate total spending per expense type by grouping and summing
total_spending = kolkata_data.groupby('Exp Type')['Amount'].sum().reset_index()

# Sort the expense types by total spending in descending order
total_spending_sorted = total_spending.sort_values(by='Amount', ascending=False)

# Create a Plotly bar chart with bars arranged in descending order of total spending
fig = px.bar(total_spending_sorted, x='Exp Type', y='Amount', title="Spending Depending on Expense Type")

# Customize layout
fig.update_layout(xaxis=dict(title="Expense Type", tickmode="array", tickvals=total_spending_sorted['Exp Type']),
                  yaxis_title="Total Amount Spent", template="plotly_white")
fig.show()


The bar chart shows the distribution of spending by expense type in Kolkata city with entertainment being the expense type with the highest spending. This might be attributed to several socio-economic and cultural factors. These combined factors suggest that entertainment is not just a leisure activity, but an integral part of the urban lifestyle in Kolkata. Kolkata is known for its rich cultural heritage, and entertainment plays a huge role in this. Major festivals like Durga Puja and Poila Boishakh (Bengali New Year) often involve significant cultural activities like theater performances, concerts, cinema, and community events, which could drive up spending in the entertainment category. The city also hosts numerous art festivals, film festivals, musical performances, and theatrical performances, drawing a large audience and fostering spending on these experiences. The rise of digital platforms such as Netflix and Amazon Prime further boosts entertainment spending, along with online gaming and subscriptions, especially in cities like Kolkata where internet access and digital adoption are growing rapidly. Kolkata also has a thriving cinema culture and is home to several renowned movie theaters and historic cultural venues like the Bengali cinema which may further drive entertainment-related expenses. Lastly, Kolkata is a popular destination for tourism, with attractions such as Victoria Memorial, Indian Museum, Howrah Bridge, and various parks and cultural centers which also contribute to entertainment spending through visits.


##**6.2 Total spending by expense type grouped by Gender**

In [None]:
# Create the Plotly bar chart
fig = px.histogram(kolkata_data, x="Exp Type", y="Amount", color="Gender",
             title="Spending depending on Expense type grouped by Gender",
             barmode="group")

# Customize layout
fig.update_layout(xaxis=dict(title="Expense Type", tickangle=45),
                  yaxis_title="Total Amount Spent",
                  legend_title="Gender",
                  template="plotly_white")

# Show the plot
fig.show()

The bar chart for Kolkata City highlights distinct gender-based spending patterns across different expense types. Men outspend women in nearly all categories, particularly in utility-related expenses like Fuel and Bills. This suggests that men in Kolkata may have greater expenditures related to transportation, home maintenance, or other essential services. On the other hand, women in Kolkata dominate the Travel category, indicating a higher expenditure on travel-related activities. This could reflect women's inclination toward leisure travel, vacations, or spending on experiences, often considered lifestyle-oriented expenses. This divergence in spending behavior offers valuable insights into gender-specific preferences and these differences may be shaped by various social, cultural, and economic factors, such as varying priorities, lifestyle choices, or access to disposable income.


##**6.3 Monthly spending trend in Kolkata**

In [None]:
# Create a copy of the DataFrame to avoid the warning
kolkata_data = kolkata_data.copy()

# Assuming 'Date' column exists and you want to extract year-month
kolkata_data['Year-Month'] = kolkata_data['Date'].dt.strftime('%Y-%m')

# Aggregate data to calculate total spending per month
spending_trend_kolkata = kolkata_data.groupby('Year-Month', as_index=False)['Amount'].sum()

# Create the line plot
fig = px.line(spending_trend_kolkata, x='Year-Month', y='Amount', title='Monthly Spending Trend in Kolkata',
              labels={'Year-Month': 'Month', 'Amount': 'Total Amount Spent'}, template='plotly_white')

# Customize layout
fig.update_layout(xaxis_title='Month', yaxis_title='Total Spending', title_font_size=18, xaxis=dict(tickangle=45))

# Show the plot
fig.show()


The line plot of monthly spending in Kolkata shows peaks in January, April, August, and October, with January 2014 being the highest spending period. January marks New Year celebrations and cultural events like Makar Sankranti, driving spending on food, clothing, gifts and travel. April aligns with the Bengali New Year (Poila Boishakh), leading to high spending on traditional attire and gifts. August is marked by Durga Puja preparations and October is marked by Durga Puja celebrations which is the city's biggest and most anticipated festival, driving significant expenditures on shopping, decorations, and travel.

Low spending months such as February, March, May, September, and November reflect quieter periods after major celebrations. Overall, spending peaks are linked to key cultural festivals, while troughs occur during transition periods before the onsets of major festivals or during recovery periods after the festivities, highlighting the influence of Kolkata’s cultural calendar on consumer behavior.

##**6.4 Daily spending trends in Kolkata**

In [None]:
# Aggregate data to calculate total spending per day for Kolkata
spending_trend_kolkata = kolkata_data.groupby('Date', as_index=False)['Amount'].sum()

# Create the line plot for Kolkata
fig = px.line(spending_trend_kolkata, x='Date', y='Amount', title='Spending Trend in Kolkata Over Time',
              labels={'Date': 'Date', 'Amount': 'Total Amount Spent'},
              template='plotly_white')

# Customize layout
fig.update_layout(xaxis_title='Date',
                  yaxis_title='Total Spending',
                  title_font_size=18)

# Show the plot
fig.show()


The daily spending trend in Kolkata highlights also peaks during major festivals and events, such as Bengali New Year, Durga Puja, and post-New Year celebrations, driven by shopping, travel, and festivities. Spending dips occur post-festival periods, like after Dussehra, Diwali, or seasonal transitions, reflecting budgeting and reduced demand. Kolkata's cultural calendar significantly influences consumer behavior, with Entertainment expenses dominating overall spending patterns.

##**6.5 Monthly spending trends in the top 10 cities compared to monthly spending trends in Kolkata**

In [None]:
# Aggregate data for total spending per day (overall)
# Create a copy of the DataFrame to avoid the warning
top_cities_data = top_cities_data.copy()

# Assuming your date column is named 'Date'
top_cities_data['Year-Month'] = pd.to_datetime(top_cities_data['Date']).dt.strftime('%Y-%m')  # Create 'Year-Month' column
spending_trend = top_cities_data.groupby('Year-Month', as_index=False)['Amount'].sum()
spending_trend['Category'] = 'Top 10 Cities'

# Aggregate data for total spending per month (Kolkata)
# Assuming your date column is named 'Date'
kolkata_data['Year-Month'] = pd.to_datetime(kolkata_data['Date']).dt.strftime('%Y-%m') # Create 'Year-Month' column
spending_trend_kolkata = kolkata_data.groupby('Year-Month', as_index=False)['Amount'].sum()
spending_trend_kolkata['Category'] = 'Kolkata'

# Combine both datasets
combined_spending_trend = pd.concat([spending_trend, spending_trend_kolkata])

# Create the line plot
fig = px.line(combined_spending_trend, x='Year-Month', y='Amount', color='Category', title='Spending Trend Comparison: Top 10 Cities vs Kolkata',
              labels={'Date': 'Date', 'Amount': 'Total Amount Spent', 'Category': 'Spending Category'},
              template='plotly_white')

# Customize layout
fig.update_layout(xaxis_title='Date', yaxis_title='Total Spending', title_font_size=18)
fig.show()
