I'm Ana Claudia, or just Ana :)
I'm motivated by getting answers, solving problems, and understanding how everything works. I like the idea of turning a heap of data into meaningful, accessible, and impactful information, which led me to develop myself as a data scientist 👩💻
Main Skills: Data Analysis, Statistics for Data Analysis, SQL, Tableau, Python and Main Data Science Libraries, Machine Learning (Classification, Regression, Clustering and Time Series).
Business Problem: A traditional health insurance company wants to expand its business, and the company has chosen to offer its customers car insurance. Through a survey, it has obtained feedback from 380,000 customers about whether or not they are interested in acquiring car insurance. The product team plans a campaign for 127,000 new customers who did not respond to the survey, in which they will be offered the auto insurance product, but realized that it would not be able to reach this entire customer base, as it will use telephone calls to offer the product, and the sales team have a capacity to make only 20,000 calls in the campaign period.
Solution: Develop a model to obtain the probability that a given customer in a list, acquire car insurance, and sort the list from the customer with the highest probability to the lowest probability of acquiring the car insurance.
Conclusion: Making 20,000 calls from a list of 127,000 customers, the model would be able to identify 44% of the total number of customers interested in acquiring car insurance, with the ML model being approximately 2.7 times better than the solution used before. Increased to 40,000 calls the model would be able to identify 75% of the total number of customers interested in purchasing auto insurance, meaning that the ML model is approximately 2.3 times better than the solution used before. To reach 90% of interested customers, with the ML model the sales team will only have to make 53,340 calls, which represents 42% of the entire customer list, a reduction of 48% of the number of calls compared to the solution used before.
Repository: Learning to Rank for Insurance Cross Sell
Rossmann is a pharmacy chain that operates over 3,000 stores in 7 European countries. The stores are going to be renovated and the CFO needs to know how much can be invested in each one of them.
Solution: Develop a prediction model for sales and a Telegram bot that returns sales predictions given a store id number.
Conclusion: The model developed predicts a gross income of 285,707,584.00 USD in the next 6 weeks for the stores available, where the best and worst case scenarios results on 286,423,764.87 USD and 284,991,409.31 USD, respectively.
Repository: Sales Forecasts for a Drugstore Chain
Business Problem: House Rocket business model consists of purchasing and reselling properties through a digital platform. The company is looking for new properties for its portfolio. The data scientist is in charge to help find the best business opportunities by answering the questions: which properties should the company buy? once the property is in the company's possession, how long wait before selling it, and what would ta the sale price be?
Solution: Develop a online dashboard with selected properties, a map view with properties distribution, a table with attributes filters, and the expected profit for each property.
Conclusion: Based on commercial criteria, 8,130 properties are recommended to be purchased by House Rocket resulting in a profit of 702,080,905.28 USD, which represents 20.33% of the total investment. This result already considering repairs or renovations expenses.
Repository: King County Housing Market Insights
Business Problem: Mariana and Laura are planning to enter the US fashion market as an e-commerce business model. The initial idea is to enter the market with only one product and for a specific public, i,n this case the product would be jeans for the female public. However, the partners have no experience in this market. So the partners hired a Data Science consultancy to answer the following questions: what would be the medium sales ticket of the products? what are the types of jeans and their colors for the initial products? what are the product's compositions? The main competitors in the business are H&M and Macy's.
Solution: Follow the steps: ETL architecture design, web scraping, data cleansing, saving data to database, data analysis, delivery of answers and insights via pdf report.
Conclusion: The medium price of the competitor's products is 29.99 USD, and 75% of the products in the dataset are between 17.99 USD and 34.99 USD. Of the 24 styles present in the dataset, 8 represent 80% of the products in the dataset. About the fit, 80% of the products have fit loose, regular, or slim. Of the 46 colors available 7 represent 80% of the products in the dataset. Jeans that are not 100% cotton are most often also made of spandex and polyester. Almost 80% of products have some percentage of material that is considered environmentally responsible.
Repository: Scraping and Analysis of Fashion Trends and Pricing