Skip to content
View Lucas-Okamura's full-sized avatar

Block or report Lucas-Okamura

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Lucas-Okamura/README.md

authorcontributions welcome

Hi, how are you? My name is Lucas Okamura! đź‘‹

About me

I am a mechanical engineer graduated from University of SĂŁo Paulo and I work as a Data Scientist at Mercado Livre. Since the beginning of the graduation I was interested in programming and it made me search for knowledge beyond the basic taught in classes, through online courses and participations in Hackathons. I seek to work in areas involving AI, Data Science and Machine Learning, aiming to use techniques for developing predictive models to analyze data and obtain insights for solving different scenarios. In my GitHub profile you will find my personal projects, developing a business solution using the concepts and tools of Data Science, from understanding the business to publishing the model in production using APIs.

Analytical Tools:

  • Data Collect and Storage: SQL, MySQL, Hadoop, Spark.
  • Data Processing and Analysis: Python, SQL.
  • Development: Git and Shell Script.
  • Data Visualization: Matplotlib, Plotly, Seaborn and Tableau.
  • Machine Learning Modeling: Classification, Regression and Clustering.
  • Machine Learning Deployment: Heroku, AWS RDS, AWS EC2, AWS S3.

Data Science Projects:

  • Business problem adressed: Eduardo and Marcelo are two Brazilians, friends and business partners. After several successful business, they are planning to enter the fashion market as an E-commerce business model. The initial idea is to enter the market with just one product and for a specific audience, in this case the product would be Jeans for the male audience. Then, the objective is to maintain the operating cost low and scale as they get customers.
  • So this project went through the entire data pipeline, from the extraction to the architecture of an airflow automation.
  • Rossmann Stores sales forecast 6 weeks ahead. Rossmann's CFO needs this information to advance revenue to renovate stores, based on each store prediction.
  • Final model used is a XGBoost Regressor, which obtained a MAPE of 9.81%, predicting a total income of $ 285 million for all stores.
  • A Health Insurance company is analysing the possibility to offer their clients a new product: a car insurance. As well as the health insurance, the clients of this new product should pay annually to obtain a certain value assured by the company, for their cars. Thus, the company should use a strategy to select the most propense customers to make a call and offer their new product.
  • Final model used is a Logistic Regression, that is roughly 2.5 times better than the baseline random model, finding 62.28% of the interested customers within the company capacity to make calls.
  • The company All in One Place is a Multibrand Outlet company, i.e., it sells second line products of various brands at a lower price, through an e-commerce platform. In 1 year of operation, the marketing team realized that some customers in its base buy more expensive products with high frequency and end up contributing with a significant portion of the company's revenue. Based on this perception, the marketing team will launch a loyalty program for the best customers in the base, called Insiders. But the team does not have an advanced knowledge of data analysis to choose the program participants. For this reason, the marketing team asked the data team to select eligible customers for the program, using advanced data manipulation techniques.
  • Final model used is a K-Means with Random Forest Embeddings, finding 8 clusters, which were loaded into a table in a AWS Database.
  • Electronic House is an e-commerce company that sells computer products for homes and offices. The Director of Global Products asked the Head of Design to develop a new way to finalize the purchase with a credit card, without the need for the manually fill in all credit card information and that would work in all countries. After months of developing this device, the Backend Development team delivered a payment solution, in which 90% of the information on the information was filled in automatically.The Head of Designer would like to measure the effectiveness of the new the credit card data on the sales page and report the results to the Global Product Director, to conclude whether the new payment method is really better than the old one.
  • Hypothesis testing was used to identify a possible increase in value per customer on the site with this new feature. However, there was no statistical evidence that the new feature increased the value spent by customers on the site.
  • In a proptech, there is a squad responsible for defining which apartments the proptech should list on its platform or bet on buying, in order to allow sales growth at exponential pace, good use of financial resources and healthy unit economics. The policies created by this squad directly influence the liquidity and risk of the company's portfolio, both from a financial perspective (losses for proptech) and risk of compromising the user experience (poorly calibrated prices). The portfolio targets set in this squad unfold across the company, so it is critical that the models that support these settings are assertive, and that the portfolio strategy adopted follows a logic that makes sense. The challenge is to create a portfolio allocation algorithm to decide, among the apartments available in target_apartments.csv, which ones proptech should buy, refurbish and sell.
  • A survival analysis was performed, identifying the relationship between price variation and liquidity of apartments, to tell which are the best apartments to buy and for which price to sell. In the end, buying the apartments with the best price / liquidity ratio, a profit of 72.62% or R$ 108,604,871.00 can be obtained in relation to the amount spent on purchases.

Skills

MySQL

Python

pandas

NumPy

Matplotlib

seaborn

scikit-learn

SciPy

Flask

statsmodels

Git / GitHub

Heroku

AWS

Connect with me

Pramod's LinkedIn     Pramod's Instagram     Pramod's Gmail    

Popular repositories Loading

  1. RossmannStoreSales RossmannStoreSales Public

    Data Science Project to predict sales of Rossmann Store.

    Jupyter Notebook 5 2

  2. Hackacthon-Visagio Hackacthon-Visagio Public

    Machine Learning models developed for Visagio Hackathon.

    HTML 2 1

  3. Instagram-Bot Instagram-Bot Public

    Instagram Bot that makes endless comments or store comments! Win Instagram draws!

    Python 2

  4. Lucas-Okamura Lucas-Okamura Public

    2 1

  5. MNIST-Machine-Learning MNIST-Machine-Learning Public

    Modelo matemático de machine learning no reconhecimento de números escritos a partir da base da dados MNIST. Desenvolvida em 2019, para uma atividade do curso de Engenharia Mecânica da Escola Polit…

    Python 1

  6. Algoritmos-de-Otimizacao Algoritmos-de-Otimizacao Public

    Códigos desenvolvidos para otimizar problemas de produção, se baseando, principalmente, em heurísticas. Orientados por uma disciplina optativa da faculdade.

    Jupyter Notebook 1