Skip to content
View lsgordon's full-sized avatar

Block or report lsgordon

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
lsgordon/README.md

Leo Gordon

My Skills

lsgordon@seas.upenn.edu | github.com/lsgordon

wakatime

Hello!

My name is Leo, I'm a current 1st year grad student at UPenn, and a current senior at Haverford College. I'm interesting in Machine Learning, Data Engineering, and Data Science. These projects are the best examples of my work. Feel free to reach out with any questions you have.


Highlighted Projects

BirdCLEF+ 2025: Species Classification w/ YamNET | GitHub May 2025

  • Developed a robust system for classifying 206 endangered and under-studied animal species (including birds, amphibians, mammals, and insects) from the Middle Magdalena Valley of Colombia using audio recordings for the BirdCLEF 2025 competition.
  • Engineered a comprehensive data preprocessing pipeline: converted audio files to mono, standardized sample rates by downsampling to 22kHz (compatible with YAMNet) using scipy.signal.resample, and implemented custom functionsfor data cleaning and handling irregularities.
  • Created fixed-length audio representations (1000 samples per segment) by padding shorter sequences and chunking/padding longer ones, preparing data for consistent model input.
  • Utilized the pre-trained YAMNet model from TensorFlow Hub for feature extraction, transforming raw audio segments into 1024-dimensional embeddings.
  • Addressed significant class imbalance in the BirdCLEF 2025 dataset by strategically downsampling processed audio segments, limiting each of the 206 classes to a maximum of 1000 samples for training the final model.
  • Designed, built, and trained a deep neural network using the Keras functional API. The architecture included an input layer for 1024-dimensional embeddings, multiple Dense layers (512, 256, 128 units) with ReLU activations, Layer Normalization for improved stability, Dropout (0.2, 0.3) for regularization, and a residual connection, leading to a Softmax output layer for multi-class classification.
  • Optimized model training by employing EarlyStopping (monitoring val_loss with a patience of 5 epochs), which helped in preventing overfitting and restoring the best performing model weights.
  • Achieved a test accuracy of approximately 71% and a test loss of 1.14 on the classification of 206 species.
  • Leveraged parallel processing with concurrent.futures to expedite the audio data reading and initial preprocessing stages.
  • Technologies: Python, Keras, TensorFlow, TensorFlow Hub, Scikit-learn, Pandas, NumPy, SoundFile, SciPy, Matplotlib, tqdm, concurrent.futures.

Big Data Final Project | GitHub

  • April 2025
  • Developed a model to differentiate between real and fake news articles using linguistic features.
  • Engineered features by calculating various text statistics (e.g., Coleman-Liau index, SMOG index, average sentence length, subjectivity, sentiment, word count, syllable count, Flesch reading ease, and Flesch-Kincaid grade level).
  • Implemented Logistic Regression, XGBoost, LSTM, and TF-IDF classification models, evaluating their performance with accuracy, F1-score, and confusion matrices. Accuracy on the best models was 99%.
  • Technologies: Python, Pandas, Statsmodels, Textstat, TextBlob, Scikit-learn, XGBoost, Matplotlib, Seaborn, SHAP

Hidden Markov Model For Stock Prediction | GitHub Jan 2025

  • Designed Hidden Markov class from scratch and applied it to predict $AAPL stock price.
  • Technologies: Pandas, Refinitiv, Python

J.P. Morgan Data for Good MKE Fellows Project | Pres Link October 2024

  • Received first place for our proposal to MKE Fellows non-profit for their next city to expand to.
    • Built k-means and agglomerative clustering models, engineered census data, and designed presentation in 24 hours.
  • Technologies: Pandas, Python, Plotly

D3 Track Times | GitHub May 2024

  • Built dual MongoDB/PSQL database of over 300K track records, which are queried to find percentile rank in D3.
  • Used by several D3 teams in PA area to aid with recruiting and modeling expected times.
  • Technologies: HTML, CSS, JS, PSQL, Mongo

Pinned Loading

  1. ANN-Final-Project ANN-Final-Project Public

    Jupyter Notebook

  2. Big-Data-Final-Project Big-Data-Final-Project Public

    Jupyter Notebook 1

  3. D3TrackTimes D3TrackTimes Public

    HTML

  4. HMM-For-Stock-Prediction HMM-For-Stock-Prediction Public

    Jupyter Notebook

  5. 2026-Summer-Internships 2026-Summer-Internships Public

    Tech Internships for Summer 2026

    9