This is the work that i performed as a virtual intern solving Real world problem provided by ANZ on Data Analytics.
The main areas where I worked is
-
Data Exploration
-
Predictive Analysis
Background Information :
This task is based on a synthesised transaction dataset containing 3 months’ worth of transactions for 100 hypothetical customers. It contains purchases, recurring transactions, and salary transactions. The dataset is designed to simulate realistic transaction behaviours that are observed in ANZ’s real transaction data, so many of the insights you can gather from the tasks below will be genuine. The relevant dataset is linked.
The Task :
-
Load the transaction dataset below into an analysis tool of your choice (Excel, R, SAS, Tableau, or similar)
-
Start by doing some basic checks – are there any data issues? Does the data need to be cleaned?
-
Gather some interesting overall insights about the data. For example -- what is the average transaction amount? How many transactions do customers make each month, on average?
-
Segment the dataset by transaction date and time. Visualise transaction volume and spending over the course of an average day or week. Consider the effect of any outliers that may distort your analysis.
-
For a challenge – what insights can you draw from the location information provided in the dataset?
-
Put together 2-3 slides summarising your most interesting findings to ANZ management.
Background Information :
This task is based on a synthesised transaction dataset containing 3 months’ worth of transactions for 100 hypothetical customers. It contains purchases, recurring transactions, and salary transactions. The dataset is designed to simulate realistic transaction behaviours that are observed in ANZ’s real transaction data, so many of the insights you can gather from the tasks below will be genuine. The relevant dataset is linked.
The Task :
- For this task, you’ll likely need to use statistical software such as R, SAS, or Python.
- Using the same transaction dataset, identify the annual salary for each customer
- Explore correlations between annual salary and various customer attributes (e.g. age). These attributes could be those that are readily available in the data (e.g. age) or those that you construct or derive yourself (e.g. those relating to purchasing behaviour). Visualise any interesting correlations using a scatter plot.
- Build a simple regression model to predict the annual salary for each customer using the attributes you identified above
- How accurate is your model? Should ANZ use it to segment customers (for whom it does not have this data) into income brackets for reporting purposes?
- For a challenge: build a decision-tree based model to predict salary. Does it perform better? How would you accurately test the performance of this model?