Gustavo Cunha ds-gustavo-cunha

Gustavo Cunha

I built a portfolio webpage where I could display my projects in a more straightforward and personal way. You can check it on this link: https://ds-gustavo-cunha.github.io/projects-portfolio/. However, you can also check a brief introduction below if you prefer to do so. 😉

The main objective of this personal portfolio is to demonstrate my skills in solving Data Science business challenges.

Gustavo Cunha

Data scientist at Nubank, AI mentor at Social Good Brasil, AI & data science teacher at Le Wagon, AWS certified machine learning specialist and AWS certified cloud practitioner, consultant and volunteer data scientist, Ex-Brazilian Navy officer.

Who am I? 😬

I am a self-disciplined, resilient, and ethical person.
I have a military background and a very curious and active mind.
Some time ago, I fell in love with data science and, since then, I've been focusing my energy and time on projects to solve business challenges using data science concepts and tools.

What analytical tools and skills do I use in my projects? 🛠

Data Collect and Storage: SQL, Postgres, MySQL, SQLite, ElasticSearch, MongoDB.
Coding: Python, Spark.
Development: Git, Github, Gitlab, Linux, MacOS, continuous integration, continuous deployment.
Statistics: sescriptive statistics, cohort analysis, inferential statistics, causal inference, AB testing, survival analysis.
Data Visualization: matplotlib, seaborn, plotly, Metabase, Power BI, Streamlit.
Data Manipulation: data cleaning, feature engineering, data preparation, dimensionality reduction, addressing class imbalance, feature selection, model tunning.
Machine Learning and Deep Learning: Classification, regression, clustering, NLP, multi-agent systems.
Model Analysis: performance metrics evaluation, model explainability.
APIs: Flask, FastAPI.
Machine Learning Deployment: Streamlit Cloud, Docker, MLFlow, Airflow, Telegram.
Cloud Computing: Amazon Web Services (AWS - EC2, RDS, Lambda, DynamoDB, OpenSearch, S3, Sagemaker); Google Cloud Platform (GCP - Cloud Storage, Compute Engine, Container Registry, Cloud Run, AI Platform).

AWS Certifications:

Links:

Data Science Projects:

Bottomline

We all live in a society that produces an overwhelming amount of information daily. Information per se is valuable but it's often very challenging to spotlight the essential part of it - the bottomline, so to say. This mental-filtering process can be very time-consuming and also confusing sometimes. With our technical solution, we provide an automated service that identifies the text's most relevant sentences to summarize the text. Additionally, the service provides the general sentiment (positive, neutral or negative) of the text. In other words, the final product will give the user a general idea about the text content as well as its most prominent sentiment.

Fraud Detection

Blocker Fraud Company is a company specialized in the detection of fraud in financial transactions made through mobile devices. The company is expanding in Brazil and, to find new customers more quickly, it has adopted a very aggressive strategy. The strategy works as follows: (1) the company will receive 25% of each transaction value that was correctly detected as fraud; (2) The company will receive 5% of each transaction value that was detected as a fraud despite being legitimate; (3) The company will return 100% of each transaction value that was detected as legitimate despite being a fraud. The final solution includes a Power BI reporting dashboard with answers to business questions as well as a Docker container with API implementation, made with FasAPI and PySpark, and a MongoDB database with APIs requests saved for future analyses.

Insiders Project

The All in One Place company is a multi-brand outlet company that sells second-line products of several brands at a lower price through e-commerce. Within just one year of operation, the marketing team realized that some customers buy more expensive products with high frequency and contribute to a significant portion of the company's revenue. This project aims to determine who are the customers eligible to participate in the Insiders program. Once this list is ready, the Marketing team will carry out a sequence of personalized and exclusive actions to this group of people to increase their sales and purchase frequency. The final solution answers business questions, validates business hypotheses, creates a reporting dashboard and implements a solution architecture in the AWS cloud.

Rossmann Store Sales

Rossmann is a company that operates over 3,000 drug stores in 7 European countries. Its products range includes up to 21,700 items and can vary depending on the size of the shop and the location. Rossmann store managers need daily sales predictions for up to six weeks in advance to plan infrastructure investments in their stores (will the next six weeks' sales be high enough to balance infrastructure investment?). The final solution for this problem is a Telegram bot where the user just needs to type the number of the store and the bot will quickly answer the sales prediction for this given store in the next six weeks. Besides, if the final user wants more detailed information about this six weeks prediction, he (she) could get further details on a data App, with an interactive chart, on sales prediction over these six weeks. Furthermore, on this data App, the user can also read the entire project overview to understand further how this prediction is made.

Health Insurance Cross-Sell

Insurance All is a health insurance company and its products team is analyzing the possibility of offering a new product, automobile insurance, for its health insurance clients. Similar to its health insurance, customers of this new insurance plan would have to pay an annual plan to be insured by Insurance All in case of an eventual car accident or damage. In this project, I developed a Machine Learning algorithm that increases the number of contacted interested customers by 1,316 and 2,259 for 20,000 and 40,000 sales teams contacts so that the estimated revenue increases are respectively U$ 131,600 and U$ 225,900.

Data Engineering Projects:

Synthetic Data Ingestion

The idea is to create synthetic data regarding customer behaviour for two groups of customers: control and treatment. We would generate this behaviour with statistical distributions (e.g. Poisson and Gamma distributions) and would ingest both the created customer behaviour and the statistical distribution params in the architecture. The data would flow throughout the architecture, e.g. data ingestion layer, a bronze layer, a silver layer, etc. As the output, we would have the data regarding the customer behaviour and its statistical distribution blueprint. Then, we could use A/B testing tools to check if there is a statistically significant difference between the control and the treatment groups. However, once we know the original distribution of both groups, we know if they are different or not, so we will be able to check if the A/B tests would give us the correct result of not (especially regarding type I and type II errors).

Problem Solving Mindset:

Problem Solving Checkpoints

After reading many books, attending many courses and doing a bunch of data science projects, I felt the need to define how I should move from real-world problems to real-world solutions in a structured way. So, the purpose of this brief material is to share my initial summary of how to structure a problem-solving strategy. I emphasize that it is just my initial MVP about this subject. In other words, it is not supposed to be a definitive solution, not even to replace any already tested framework! I'm sharing this compilation so anyone interested in this topic can learn or remember something relevant to solve some real problem: if this happens somehow, I would be delighted!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gustavo Cunha ds-gustavo-cunha

Achievements

Achievements

Block or report ds-gustavo-cunha

Gustavo Cunha

Gustavo Cunha

Data Science Projects:

Bottomline

Fraud Detection

Insiders Project

Rossmann Store Sales

Health Insurance Cross-Sell

Data Engineering Projects:

Synthetic Data Ingestion

Problem Solving Mindset:

Problem Solving Checkpoints

Medium Posts:

Six lessons I learnt during the last six months of data scientist experience

Applying the Lean Startup mindset to data science projects

Visual intelligence techniques applied on data science projects

Popular repositories Loading