Skip to content
View ds-gustavo-cunha's full-sized avatar
👨‍💻
👨‍💻

Block or report ds-gustavo-cunha

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ds-gustavo-cunha/README.md

Gustavo Cunha

I built a portfolio webpage where I could display my projects in a more straightforward and personal way. You can check it on this link: https://ds-gustavo-cunha.github.io/projects-portfolio/. However, you can also check a brief introduction below if you prefer to do so. 😉

The main objective of this personal portfolio is to demonstrate my skills in solving Data Science business challenges.

drawing

Gustavo Cunha


Data scientist at Nubank, AI mentor at Social Good Brasil, AI & data science teacher at Le Wagon, AWS certified machine learning specialist and AWS certified cloud practitioner, consultant and volunteer data scientist, Ex-Brazilian Navy officer.


Who am I? 😬

  • I am a self-disciplined, resilient, and ethical person.

  • I have a military background and a very curious and active mind.

  • Some time ago, I fell in love with data science and, since then, I've been focusing my energy and time on projects to solve business challenges using data science concepts and tools.


What analytical tools and skills do I use in my projects? 🛠
  • Data Collect and Storage: SQL, Postgres, MySQL, SQLite, ElasticSearch, MongoDB.

  • Coding: Python, Spark.

  • Development: Git, Github, Gitlab, Linux, MacOS, continuous integration, continuous deployment.

  • Statistics: sescriptive statistics, cohort analysis, inferential statistics, causal inference, AB testing, survival analysis.

  • Data Visualization: matplotlib, seaborn, plotly, Metabase, Power BI, Streamlit.

  • Data Manipulation: data cleaning, feature engineering, data preparation, dimensionality reduction, addressing class imbalance, feature selection, model tunning.

  • Machine Learning and Deep Learning: Classification, regression, clustering, NLP, multi-agent systems.

  • Model Analysis: performance metrics evaluation, model explainability.

  • APIs: Flask, FastAPI.

  • Machine Learning Deployment: Streamlit Cloud, Docker, MLFlow, Airflow, Telegram.

  • Cloud Computing: Amazon Web Services (AWS - EC2, RDS, Lambda, DynamoDB, OpenSearch, S3, Sagemaker); Google Cloud Platform (GCP - Cloud Storage, Compute Engine, Container Registry, Cloud Run, AI Platform).


AWS Certifications:

Linkedin Badge

Medium Badge


Links:

Linkedin Badge

Medium Badge

Gmail Badge


Data Science Projects:

We all live in a society that produces an overwhelming amount of information daily. Information per se is valuable but it's often very challenging to spotlight the essential part of it - the bottomline, so to say. This mental-filtering process can be very time-consuming and also confusing sometimes. With our technical solution, we provide an automated service that identifies the text's most relevant sentences to summarize the text. Additionally, the service provides the general sentiment (positive, neutral or negative) of the text. In other words, the final product will give the user a general idea about the text content as well as its most prominent sentiment.

Blocker Fraud Company is a company specialized in the detection of fraud in financial transactions made through mobile devices. The company is expanding in Brazil and, to find new customers more quickly, it has adopted a very aggressive strategy. The strategy works as follows: (1) the company will receive 25% of each transaction value that was correctly detected as fraud; (2) The company will receive 5% of each transaction value that was detected as a fraud despite being legitimate; (3) The company will return 100% of each transaction value that was detected as legitimate despite being a fraud. The final solution includes a Power BI reporting dashboard with answers to business questions as well as a Docker container with API implementation, made with FasAPI and PySpark, and a MongoDB database with APIs requests saved for future analyses.

The All in One Place company is a multi-brand outlet company that sells second-line products of several brands at a lower price through e-commerce. Within just one year of operation, the marketing team realized that some customers buy more expensive products with high frequency and contribute to a significant portion of the company's revenue. This project aims to determine who are the customers eligible to participate in the Insiders program. Once this list is ready, the Marketing team will carry out a sequence of personalized and exclusive actions to this group of people to increase their sales and purchase frequency. The final solution answers business questions, validates business hypotheses, creates a reporting dashboard and implements a solution architecture in the AWS cloud.

Rossmann is a company that operates over 3,000 drug stores in 7 European countries. Its products range includes up to 21,700 items and can vary depending on the size of the shop and the location. Rossmann store managers need daily sales predictions for up to six weeks in advance to plan infrastructure investments in their stores (will the next six weeks' sales be high enough to balance infrastructure investment?). The final solution for this problem is a Telegram bot where the user just needs to type the number of the store and the bot will quickly answer the sales prediction for this given store in the next six weeks. Besides, if the final user wants more detailed information about this six weeks prediction, he (she) could get further details on a data App, with an interactive chart, on sales prediction over these six weeks. Furthermore, on this data App, the user can also read the entire project overview to understand further how this prediction is made.

Insurance All is a health insurance company and its products team is analyzing the possibility of offering a new product, automobile insurance, for its health insurance clients. Similar to its health insurance, customers of this new insurance plan would have to pay an annual plan to be insured by Insurance All in case of an eventual car accident or damage. In this project, I developed a Machine Learning algorithm that increases the number of contacted interested customers by 1,316 and 2,259 for 20,000 and 40,000 sales teams contacts so that the estimated revenue increases are respectively U$ 131,600 and U$ 225,900.


Data Engineering Projects:

The idea is to create synthetic data regarding customer behaviour for two groups of customers: control and treatment. We would generate this behaviour with statistical distributions (e.g. Poisson and Gamma distributions) and would ingest both the created customer behaviour and the statistical distribution params in the architecture. The data would flow throughout the architecture, e.g. data ingestion layer, a bronze layer, a silver layer, etc. As the output, we would have the data regarding the customer behaviour and its statistical distribution blueprint. Then, we could use A/B testing tools to check if there is a statistically significant difference between the control and the treatment groups. However, once we know the original distribution of both groups, we know if they are different or not, so we will be able to check if the A/B tests would give us the correct result of not (especially regarding type I and type II errors).


Problem Solving Mindset:

After reading many books, attending many courses and doing a bunch of data science projects, I felt the need to define how I should move from real-world problems to real-world solutions in a structured way. So, the purpose of this brief material is to share my initial summary of how to structure a problem-solving strategy. I emphasize that it is just my initial MVP about this subject. In other words, it is not supposed to be a definitive solution, not even to replace any already tested framework! I'm sharing this compilation so anyone interested in this topic can learn or remember something relevant to solve some real problem: if this happens somehow, I would be delighted!


Medium Posts:

Sharing learnings throughout the last six months of data scientist experience.

Applying some core concepts of “The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses” by Eric Ries to make data science projects more effective.

Applying techniques of “Visual Intelligence: Sharpen Your Perception, Change our Life” by Amy E. Herman on Data Science Projects.

Popular repositories Loading

  1. Bottomline-Project Bottomline-Project Public

    Python 2 1

  2. Rossmann-Store-Sales Rossmann-Store-Sales Public

    This repository contains the codes to predict sales six weeks in advance

    Jupyter Notebook 1

  3. Insiders-Project Insiders-Project Public

    This repository contains the codes to support a company loyalty program for its high-value customers.

    Jupyter Notebook 1

  4. ds-gustavo-cunha ds-gustavo-cunha Public

  5. Health-Insurance-Cross-Sell Health-Insurance-Cross-Sell Public

    This repository contains the codes to support an insurance company cross sell their products

    Jupyter Notebook

  6. Fraud-Detection Fraud-Detection Public

    This repository contains the code to support a fraud detection solution for a company specialized in mobile devices transactions.

    Jupyter Notebook