Skip to content
View azizbarank's full-sized avatar

Block or report azizbarank

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
azizbarank/README.md

Hey there πŸ‘‹, I'm Aziz Baran πŸ‘¨β€πŸ’»

I'm a master's student in AI focusing on Machine Learning and Natural Language Processing.

I'm passionate about constantly improving myself in the fields of Data Science and Machine learning with the aim of bringing the most effective solutions to different types of business related real-world problems.

During my master's education, I realized I would enjoy leveraging AI to drive business impact in BI environments. Therefore, besides my studies, I'm learning the well-known BI tools of PowerBI and Tableau to gain insights from data and help a given business in a decision making process.

In my spare time, I write posts about my personal experience in NLP and publish them in my GitHub profile under the NLP Tutorials repository.

How to reach me:

Linkedin Badge Email Badge

πŸ› οΈ Skills

Languages

Python

Business Intelligence

Power BI Tableau

Natural Language Processing

Hugging Face Transformers

Machine Learning

scikit-learn NumPy Pandas

IDEs & Notebooks

Visual Studio Code Jupyter Google Colab Jupyter Notebook

Other Technologies & Tools

Anaconda GitHub Microsoft Excel Overlaf

πŸ“ƒ Projects

This section is divided into two parts; 'Machine Learning & NLP' and 'Data Analysis' projects. You can click the given links to look at the details of the projects further.


πŸ“ˆ Data Analysis Projects

  • California Infectious Diseases Analysis

    Analyzed the dataset about selected communicable infectious diseases reported in the state of California between 2001-2022. The dataset was taken from the official website of Data.gov. Through using pandas and matplotlib, the trends of common diseases throughout the giiven years was looked at and additionally, the distribution of them by gender and counties were considered.

  • Chicago Crime Rate Analysis

    Data analysis project to gain insight into how the crime rates fluctuated in Chicago between 2001-2024. Python libraries of pandas, numpy, matplotlib and seaborn are used to achieve this. Additionally, the most common types of crimes and the locations they occur the most are considered. The dataset was taken from Data.gov

  • Connecticut Real Estate Analysis

    Analyzed the real estate sales in the state of Connecticut between the years of 2001-2022 using Python and its libraries of pandas, numpy and matplotlib. Through analysis, the main goal was to discover the general trend of the sales amount, what type of residence types are in demand the most, the expense of the cities and relatedly, which locations are over and undervalued.

  • Amsterdam Airbnb Data Analysis

    The Airbnb market analysis of Amsterdam. The data was taken from Inside Airbnb. To analyze the corresponding CSV files, Microsoft PowerBI was used. During the analysis, the main goals were to discover the trend of total listings throughout the years, to compare the neighborhoods in terms of price, and to discover the most popular hosts.


πŸ€– Machine Learning & NLP Projects

  • Turkish Sentiment Analyser - Hugging Face - Web App

    Fine-tuned the distilled Turkish BERT model on a review classification dataset for sentiment analysis. The final model achieved 86% accuracy and was deployed to Hugging Face Spaces using Streamlit as an interactive web app. The app provides a no-code way for people to see whether a particular review is "positive" or "negative".

  • Toxic Comment Detector - Web App

    Binary classification project to predict whether a comment is toxic or not. Three machine learning models of Multinomial Naive Bayes, Logistic Regression, and Support Vector Machine were used. The best model was a Naive Bayes classifier with TF-IDF Vectorizer with the F1 and Recall scores of 0,85 and 0,88, respectively. The application uses this model to predict the toxicity of comments.

  • cst5 - Hugging Face

    cst5 is a tiny T5 model for the Czech language that is based on the smaller version of Google's mT5 model. cst5 is meant to help people in doing experiments for the Czech language by enabling them to use a lightweight model, rather than the 101 languages-covering massive mT5. cst5 was obtained by retaining only the Czech and English embeddings of the mT5 model, during which the total size was reduced from 2.2GB to 0.9GB as a result of shrinking the original "sentencepiece" vocabulary from 250K to 30K tokens and parameters from 582M to 244M. cst5, thus, allows people to do fine-tuning for further downstream tasks in the Czech language with less size requirement and without any loss in quality from the original multilingual model.

  • Financial Sentiment Analysis with Machine Learning, LSTM, and BERT Transformer

    Financial sentiment analysis project to predict if a given financial text is to be considered as positive, negative or neutral. Machine learning, LSTM, and BERT transformer were used during the process. The best result was obtained with BERT. It achieved the accuracy score of 0.77.


πŸ’» My Posts about NLP

Pinned Loading

  1. California-Infectious-Diseases-Analysis California-Infectious-Diseases-Analysis Public

    Data analysis of selected communicable infectious diseases reported for California residents between 2001-2022.

    Jupyter Notebook 1

  2. Chicago-Crime-Rate-Analysis Chicago-Crime-Rate-Analysis Public

    Analysis of trends in crime rates in Chicago between 2001-2024

    Jupyter Notebook 1

  3. Turkish-Sentiment-Analyser Turkish-Sentiment-Analyser Public

    This project fine-tunes the distilled Turkish BERT model on a review dataset for doing sentiment analysis. After the fine-tuning, Hugging Face Spaces and Streamlit are used to deploy the final mode…

    Jupyter Notebook

  4. Toxic-Comment-Detector Toxic-Comment-Detector Public

    This project applies classification models with the aim of automating the detection of toxic comments on social media. After choosing the model with the best performance, HuggingFace + Streamlit ar…

    Jupyter Notebook 1

  5. Czech-T5-Base-Model Czech-T5-Base-Model Public

    This is the t5 base model for the Czech that is based on the smaller version of the google/mt5-base model. To make this model, I retained only the Czech and some of the English embeddings from the …

    Jupyter Notebook 2

  6. Financial-Sentiment-Analysis-with-Machine-Learning-LSTM-and-BERT-Transformer Financial-Sentiment-Analysis-with-Machine-Learning-LSTM-and-BERT-Transformer Public

    This project applies three main methods to make sentiment analysis on financial data: Machine Learning, LSTM using TensorFlow with Keras API, and BERT Transformer using the "simpletransformers" lib…

    Jupyter Notebook