Skip to content

aliasad059/CISC834_GA

Repository files navigation

🧩 A Reproduction Study on the Co-evolution of Infrastructure and Source Code in the Age of AI

📘 Overview

This repository accompanies the study “A Reproduction Study on the Co-evolution of Infrastructure and Source Code in the Age of AI”, which revisits and extends the foundational work by Jiang and Adams (2015) to the current AI-driven software landscape.

Infrastructure as Code (IaC) has become a cornerstone of DevOps, enabling automated and reproducible deployments. Yet, maintaining and evolving IaC remains challenging—especially in AI projects, where machine learning pipelines and data workflows tightly intertwine infrastructure and application code.

This study examines how infrastructure–code co-evolution has changed in the era of AI by reproducing the original methodology on modern projects and introducing an AI vs. Non-AI comparative analysis.


🧠 Research Design

📂 Dataset

We analyzed 20 GitHub repositories that use Kubernetes for deployment:

  • 10 AI-related projects (with ML/AI components, training, or inference pipelines)
  • 10 Non-AI projects (general-purpose software systems)
  • One year of commit history per project

Each file in every repository was categorized as:

  • Infrastructure – IaC files such as Kubernetes manifests, Dockerfiles, Helm, etc.
  • Build – CI/CD pipelines and automation scripts
  • Production – Application logic or ML model code
  • Test – Unit, integration, or model evaluation tests
  • Miscellaneous – Documentation, configs, and supporting files

🎯 Research Questions

Preliminary Questions

  • PQ1: How many infrastructure files does a project have?
  • PQ2: How many infrastructure file changes occur per month?
  • PQ3: How large are infrastructure system changes?

Main Research Question

  • RQ1: How tight is the coupling between infrastructure code and other kinds of code?

🔬 Comparative Analysis

We performed two levels of comparison:

  1. Reproduction Validation: Comparing our computed metrics (support, confidence, lift) and file statistics against the results from Jiang & Adams (2015) to confirm methodological consistency.
  2. AI vs. Non-AI Comparison: Splitting our dataset into AI and Non-AI groups to observe how the presence of machine learning pipelines influences IaC evolution and co-change behavior.

🔍 Key Findings

  • Modern infrastructure documentation has become more concise, standardized, and stable.
  • AI projects show stronger coupling between infrastructure and production files, reflecting the integration of ML pipelines with deployment and serving environments.
  • The evolution pattern of IaC identified by Jiang and Adams (2015) still holds—but manifests differently within AI-driven ecosystems, emphasizing the need for maintainable and modular infrastructure design in MLOps workflows.

For full details and data interpretation, see report.pdf.


🧩 Relation to Original Study

This work is based on and extends:

Y. Jiang and B. Adams, “Co-evolution of infrastructure and source code—an empirical study,” 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories (MSR), pp. 45–55.

We replicated their analysis using a new dataset of contemporary Kubernetes-based open-source projects while introducing an AI-awareness dimension to understand how machine learning integration affects infrastructure evolution.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages