Skip to content
View DarienNouri's full-sized avatar

Highlights

  • Pro

Block or report DarienNouri

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
DarienNouri/README.md

Hi there, I'm Darien Nouri 👋


📚 Completed my undergraduate at New York University in both Data Science and Computer Science





🔍 Featured Projects

1. Impact of ML Model Complexity on Pairs Trading Optimization

Objective: Investigated the relationship between ML model complexity and forecasting accuracy for stock pair spread movements, with a focus on optimizing trading performance.

  • Led research investigating the relationship between ML model complexity and forecasting accuracy for stock pair spread movements, with a focus on optimizing trading performance.
  • Engineered a rich 280-feature set, including Refinitiv API-derived technical indicators, custom-scraped news headlines with BERT-based sentiment analysis, and Bloomberg Twitter sentiment data.
  • Evaluated model performance across the complexity spectrum: from simple Generalized Linear Models (GLMs) to advanced neural networks, including Gradient Boosting Machines (GBMs), Random Forests (RF), Vanilla LSTM (~50k parameters), Hyperparameter-tuned LSTM (2M+ parameters), and Bidirectional LSTM with dropout.
  • Developed a backtesting framework incorporating Bayesian-adjusted mean reversion strategies, achieving 80.75% annualized returns with tuned LSTM model—an average 105% improvement over non-deep learning models.
  • GitHub Repository

2. Urban Dynamics and Real Estate Markets: Enhancing Market Forecasts with Non-Traditional Data

Objective: Explored the predictive power of non-traditional urban data sources in forecasting real estate trends and Real Estate Investment Trust (REIT) performance.

  • Led research on the efficacy of non-traditional urban data in predicting real estate trends and REIT performance, analyzing datasets such as Citibike usage, building complaints, business operations, health inspections, evictions, etc.
  • Architected a scalable, cloud-based ecosystem to process and analyze terabytes of diverse datasets, leveraging technologies like MongoDB, AWS services, and distributed computing frameworks such as Dask and PyArrow for efficient data handling.
  • Engineered high-performance data acquisition systems, including a web scraper for MLS data processing over 1 million listings daily, and a parallel ingestion framework for more than 1 billion historical Citibike rides.
  • Integrated Granger-causality optimized alternative data sources, improving market index forecasts by 25% and REIT predictions by 34.2%.
  • Designed an interactive Streamlit application for non-technical stakeholders, allowing intuitive exploration of correlations between alternative data and market trends. (Feature Explorer Demo Render)
  • GitHub Repository | IEEE Report (Pending Submission)

3. Real Estate Valuator & Market Analysis Frameworks

Objective: Developed a multi-model machine learning framework to valuate residential properties and forecast sale probabilities, integrating data from diverse sources.

  • Engineered a multi-model ML framework utilizing ensemble and gradient boosting methods for residential property valuation and sale probability forecasting, integrating data from Zillow, Yelp, and other relevant sources.
  • Developed an interactive Streamlit application that allows users to access real-time property valuations, explore detailed market data, and visualize macroeconomic trends in the real estate sector.
  • Valuation App Demo | Market Analysis App Demo | Analysis GitHub Repository

4. Web App Deployment Service

Objective: Automated the deployment of Python-based web applications on AWS EC2 instances, streamlining continuous integration and delivery processes.

  • Developed a scalable and automated deployment system for Python-based web applications (Dash, Streamlit) on AWS EC2 instances, enhancing the efficiency of continuous integration and delivery (CI/CD).
  • Integrated a GitHub webhook-based deployment system that automatically triggers deployments upon code pushes, ensuring seamless and streamlined deployment of MVPs and new features.
  • Leveraged PM2 for continuous process management and Nginx as a reverse proxy, providing high availability and load balancing for deployed applications.
  • Incorporated dynamic Nginx configuration generation based on deployed applications, enabling the dynamic management of custom domains and routing rules.
  • GitHub Repository

5. Embedding-Based Clustering of Political News Headlines

Objective: Applied NLP techniques to classify the political orientation of news headlines using advanced embedding and clustering methods.

  • Implemented text preprocessing, feature extraction, and clustering algorithms, including Spectral Clustering and K-Means, to classify the political orientation of news headlines.
  • Utilized BERT models for text embedding, providing a rich representation of the textual data for clustering purposes.
  • Explored dimensionality reduction techniques like PCA and t-SNE for effective visualization of high-dimensional data, enhancing the interpretability of clustering results.
  • GitHub Repository

6. Web Scrapers

Objective: Developed high-performance web scrapers for automating data extraction from various online platforms, supporting various projects and research initiatives.

  • Zillow Scraper: Built a high-performance web scraper using Azure, capable of extracting over 2 million property listings per scrape on a weekly automated schedule. GitHub Repository
  • LinkedIn Scraper: Created a scraper for LinkedIn to extract data from both company and user profiles, gathering information such as company posts, user experiences, and education. GitHub Repository
  • Yelp Scraper: Developed a Selenium-based web scraper to extract restaurant data, including information on health ratings, review counts, and business details. GitHub Repository

Pinned Loading

  1. Trading-Strategy-Project Trading-Strategy-Project Public

    Repo to house deep learning trading strategy project for Advanced Topics in DS

    Jupyter Notebook 2 1

  2. alt-data-real-estate-predictions alt-data-real-estate-predictions Public

    This repo contains relevant project based code with respect to data collection, exploratory analysis, and modeling. The central goal of the related project was to identify and evaluate the efficacy…

    Jupyter Notebook

  3. NLP-Document-Classification NLP-Document-Classification Public

    This repository explores the correlation between news headlines' textual embeddings and their political orientation. Using clustering and transformer-based embeddings, the goal is to classify news …

    Jupyter Notebook

  4. Streamlit-RE-Analysis-Web-App Streamlit-RE-Analysis-Web-App Public

    This is a Streamlit web app that provides an exploratory data analysis on real estate data. The app loads data from a CSV file, preprocesses the data, and provides several visualizations for data e…

    Python

  5. NYU-Advanced-ML-Deep-Learning-Studies NYU-Advanced-ML-Deep-Learning-Studies Public

    This repo contains a collection of notebooks pertaining to various advanced topics in ML and deep learning that were created as part of NYU studies.

    Jupyter Notebook 1

  6. NYU-Quantum-Computing NYU-Quantum-Computing Public

    Repo contains files and work related to Special Topics in Quantum Computing.

    Jupyter Notebook