Daniele Polotow danielepolotow

Hi there 👋

Daniele Polotow - [Data Scientist] 👋

Data Scientist @ Gupy - Python, Machine Learning, Statistics

Daniele works as a data scientist, applying natural language processing techniques to feed classification models, aiming to develop efficient applications. She has a strong statistical training and a passion for data visualization and storytelling. With a background in research, she worked mostly with unsupervised machine learning, modeling the evolution of genes.

Connect with me:

Languages and Tools:

Data Science Portfolio

Repository containing portfolio of data science projects completed by me for academic, self learning, and hobby purposes. Coded in Python and presented as Jupyter Notebooks.

Languages: English and Portuguese.

Machine Learning
- Spider Image Classification with TensorFlow: Using supervised algorithm with Keras and TensorFlow. This is an ongoing project that uses deep learning to identify SEM (scanning electron microscope) images of spiders. The images are divided in 6 categories (chelicerae, eyes, legs, palp, spinnerets, and trichobothia), which are used to describe important features in spider morphology. As an ongoing project, the images used will not be public in this repository.
- Clustering the Big 5 Personalities: In this notebook I present a model with K-Means Clustering using Python and Scikit-learn to classify test responses in groups. K-Means is an algorithm with unsupervised learning, which will group responses by similarity. The results allow us to make predictions and classify new data. Last, I created an interface to interact with users and get the answers to the test and the predictions from the model.
Data Visualization and Exploration
- The famous Titanic Dataset: In this notebook, we will analyze the famous Titanic data set, available on Kaggle. The dataset is intended for supervised machine learning, but we'll just do some exploratory analysis here.
- Cars Dataset: Data visualization and exploration of this famous dataset for cars from the 70's and 80's and their associated price & features.
- NBA data with nba_api: Basic steps to access data in this huge API.
Bioinformatics
- SARS-coronavirus-3C-like-proteinase - Bioactivity Data Analysis: In this project, I’m going to explore the ChEMBL database and analyse data related to SARS coronavirus 3C-like proteinase. I selected molecules with the same bioactivity unit types (in this case = standard_type="IC50"), or 50% inhibition of the target protein. Then, labeled compounds as either being active, inactive, or intermediate. After that, I calculated the Lipinski descriptors (Absorption, Distribution, Metabolism and Excretion (ADME) that are also known as the pharmacokinetic profile). Finally, I used a function to test if the active and inactive molecules have a significant distributional difference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Daniele Polotow danielepolotow

Achievements

Achievements

Block or report danielepolotow

Hi there 👋

Daniele Polotow - [Data Scientist] 👋

Data Scientist @ Gupy - Python, Machine Learning, Statistics

Connect with me:

Languages and Tools:

Data Science Portfolio

Machine Learning

Data Visualization and Exploration

Bioinformatics

Popular repositories Loading