## Project Objectives
The primary goal of this project is to use the **Ant Colony Optimization (ACO)** algorithm to improve the estimation of **Solution Gas-Oil Ratio (GOR)** by identifying correlations with **Pressure-Volume-Temperature (PVT)** conditions. This approach will enable the following:

1. **Optimize GOR Estimation**: Enhance the accuracy of GOR predictions based on PVT correlations.
2. **Identify Optimal PVT Conditions**: Discover the conditions that maximize or minimize GOR values.
3. **Uncover Correlation Patterns**: Reveal the relationships between GOR and various PVT parameters (e.g., downhole pressure, temperature) to inform better reservoir management.

## Key Columns for ACO Analysis
The ACO algorithm will focus on the following columns to determine optimal conditions for GOR:

- **Calculated_GOR**: The target variable representing the solution gas-oil ratio.
- **AVG_DOWNHOLE_PRESSURE**: Downhole pressure, a critical factor affecting GOR.
- **AVG_DOWNHOLE_TEMPERATURE**: Downhole temperature, which influences the solution gas-oil ratio.
- **BORE_OIL_VOL** and **BORE_GAS_VOL**: Volumes of produced oil and gas, which can help establish relationships with GOR.
- **Wellbore Name** *(optional)*: To distinguish between different wells if unique PVT correlations are applicable.

## Approach
1. **ACO Algorithm Setup**: Configure the ACO algorithm to explore different combinations of downhole pressure, temperature, and other parameters to optimize the calculated GOR.
2. **Objective Function**: Define an objective function for ACO to minimize errors between predicted and actual GOR values.
3. **Analysis of Results**: Examine the output to identify PVT conditions that consistently yield optimal GOR values, aiding in well performance optimization.


In [6]:
import sympy as smp
from sympy import*
import numpy as np
from sklearn.linear_model import LinearRegression

In [7]:
import numpy as np

class AntColonyOptimization:
    def __init__(self, pvt_data, num_ants=10, num_iterations=100, decay=0.95, alpha=1, beta=2):
        self.pvt_data = pvt_data
        self.num_ants = num_ants
        self.num_iterations = num_iterations
        self.decay = decay
        self.alpha = alpha
        self.beta = beta
        self.distance_matrix = self.calculate_distance_matrix()
        self.pheromone_matrix = np.ones_like(self.distance_matrix) / len(pvt_data)
        self.shortest_path = None
        self.shortest_cost = np.inf


### Ant Colony Optimization Class Initialization

This part of the code defines the initialization function for the `AntColonyOptimization` class, setting up key parameters and structures necessary for the algorithm.

#### Explanation

1. **Library Import**:  
   The `numpy` library is imported to provide tools for matrix operations and efficient numerical calculations.

2. **Class Definition**:  
   The `AntColonyOptimization` class is created to structure the algorithm in a single, organized class.

3. **Initialization Method (`__init__`)**:  
   The `__init__` function sets up the initial parameters and data for the algorithm, which includes:

   - **Parameters**:
     - `pvt_data`: Represents the input dataset, typically containing various PVT (pressure, volume, temperature) properties.
     - `num_ants`: Defines the number of ants in each simulation cycle. A higher number increases exploration but also computational cost.
     - `num_iterations`: Total number of cycles the algorithm will run to find an optimal path. Higher values allow more refinement but require more time.
     - `decay`: A rate at which pheromone levels decay in each iteration. This prevents early paths from dominating and encourages fresh exploration.
     - `alpha` and `beta`: These parameters influence how much pheromone strength (`alpha`) and distance (`beta`) impact an ant’s path choices:
       - Higher `alpha` values increase reliance on pheromone trails.
       - Higher `beta` values emphasize shorter distances, biasing the ants toward closer paths.

   - **Distance Matrix**:  
     The `distance_matrix` is calculated by calling `calculate_distance_matrix()`, which computes the Euclidean distance between every pair of points in the data. This matrix represents all possible "paths" the ants can take.

   - **Pheromone Matrix**:  
     The `pheromone_matrix` is initialized with equal values across all paths, representing initial pheromone levels that ants will use to choose paths. The matrix is divided by the total number of data points, ensuring small, uniform values at the start.

   - **Shortest Path Tracking**:  
     The algorithm stores the `shortest_path` and `shortest_cost` found so far to track the best route identified by the ants. Initially, `shortest_path` is empty, and `shortest_cost` is set to infinity so that any path found will be an improvement.


In [8]:
def calculate_distance_matrix(self):
    num_points = len(self.pvt_data)
    dist_matrix = np.zeros((num_points, num_points))
    
    for i in range(num_points):
        for j in range(num_points):
            if i != j:
                dist_matrix[i, j] = np.sqrt(
                    (self.pvt_data.iloc[i]['AVG_DOWNHOLE_PRESSURE'] - self.pvt_data.iloc[j]['AVG_DOWNHOLE_PRESSURE']) ** 2 +
                    (self.pvt_data.iloc[i]['AVG_DOWNHOLE_TEMPERATURE'] - self.pvt_data.iloc[j]['AVG_DOWNHOLE_TEMPERATURE']) ** 2 +
                    (self.pvt_data.iloc[i]['AVG_ANNULUS_PRESSURE'] - self.pvt_data.iloc[j]['AVG_ANNULUS_PRESSURE']) ** 2 +
                    (self.pvt_data.iloc[i]['AVG_CHOKE_SIZE_P'] - self.pvt_data.iloc[j]['AVG_CHOKE_SIZE_P']) ** 2 +
                    (self.pvt_data.iloc[i]['GOR'] - self.pvt_data.iloc[j]['GOR']) ** 2 
                )
    return dist_matrix


### Distance Matrix Calculation

This function calculates the distance matrix, which quantifies the "distance" between every pair of points in the dataset. Each entry in the matrix represents the Euclidean distance between two data points based on key features related to downhole conditions.

#### Explanation

1. **Initialization of Matrix**:  
   The function begins by determining the total number of points (`num_points`) in the `pvt_data` dataset. A zero-filled square matrix of shape `(num_points, num_points)` is created, where each element will later store the computed distance between two points.

2. **Loop through Points**:  
   A nested loop iterates through each pair of points `(i, j)`, where:
   - For pairs where `i != j` (i.e., different points), the function calculates the Euclidean distance between points `i` and `j`.
   - When `i == j` (the same point), the distance remains zero, as it’s a self-comparison.

3. **Distance Calculation**:  
   The Euclidean distance between points `i` and `j` is calculated using four features:
   - `AVG_DOWNHOLE_PRESSURE`
   - `AVG_DOWNHOLE_TEMPERATURE`
   - `AVG_ANNULUS_PRESSURE`
   - `AVG_CHOKE_SIZE_P`

   For each feature, the difference between the values at points `i` and `j` is squared, and then all squared differences are summed. The square root of this sum is taken to obtain the Euclidean distance. This value represents the multidimensional distance between the conditions at these two points.

4. **Return Matrix**:  
   After all pairs are processed, the distance matrix is returned. This matrix will be used by the ants to make path decisions, with closer points being more attractive paths in the optimization process.


In [58]:
def run(self):
    for _ in range(self.num_iterations):
        ants_paths = self.generate_ant_paths()
        self.update_pheromone(ants_paths)
        shortest_path, shortest_cost = self.get_shortest_path(ants_paths)
        
        if shortest_cost < self.shortest_cost:
            self.shortest_path = shortest_path
            self.shortest_cost = shortest_cost
            
    # Retrieve GOR values for the shortest path
    gor_values = [self.pvt_data.iloc[i]['GOR'] for i in self.shortest_path]
    return self.shortest_path, self.shortest_cost, gor_values

### Ant Colony Optimization Execution (Run Function)

This `run` function executes the main Ant Colony Optimization (ACO) loop over a specified number of iterations. Here’s a breakdown of each component of the function:

1. **Iteration Loop**:  
   The loop iterates for `self.num_iterations`, which represents the total number of cycles the ants will perform in search of an optimal path. Each iteration allows the ants to explore new paths, gradually improving the pheromone trail toward an optimal solution.
> the algorithm considers all rows in the dataset as potential nodes (or "cities") in the optimization process.

3. **Generating Ant Paths**:  
   Within each iteration, the `generate_ant_paths` function is called to simulate the paths taken by each ant through the data. Each ant follows its own path based on pheromone levels and distances, aiming to find shorter paths with higher pheromone concentration.

4. **Updating Pheromone Levels**:  
   The `update_pheromone` function updates the pheromone matrix using the paths generated by the ants. Paths with shorter distances (or better performance) are rewarded with higher pheromone deposits, making them more attractive in future iterations. This process enhances the "memory" of the system, helping ants in future iterations follow paths that led to lower costs.

5. **Tracking the Shortest Path**:  
   The `get_shortest_path` function is called to evaluate the cost of each ant’s path from the current iteration and identify the shortest one. If this path has a lower cost than any previously found, the algorithm updates `self.shortest_path` and `self.shortest_cost` to reflect this new optimal solution.

6. **Return Optimal Solution**:  
   After completing all iterations, the function returns the best path (`self.shortest_path`) and its associated cost (`self.shortest_cost`). This represents the optimized sequence based on the dataset and chosen parameters.


**Iteration over Rows**: Each ant in the algorithm will try to traverse all rows (nodes) to complete a path. Each node represents a unique state in the dataset, defined by columns like pressure, temperature, and choke size.

**Path Building**: Every ant will attempt to build a path that passes through a subset (or all) of these rows, depending on how many nodes are chosen for a full path traversal in your setup.

**Pheromone Update and Distance Matrix**: The pheromone matrix and distance matrix are adjusted based on all possible connections among the rows (nodes). Thus, one iteration explores potential paths over all rows to identify a viable path and updates the pheromone trails accordingly.

In [1]:
def generate_ant_paths(self):
    num_points = len(self.pvt_data)
    ants_paths = []
    
    for _ in range(self.num_ants):
        start = np.random.randint(num_points)
        path = [start]
        visited = set([start])
        
        while len(visited) < num_points:
            probs = self.calculate_probabilities(path[-1], visited)
            next_point = np.random.choice(num_points, p=probs)
            path.append(next_point)
            visited.add(next_point)
            
        ants_paths.append(path)
    return ants_paths


### Generating Ant Paths (`generate_ant_paths` function)

The `generate_ant_paths` function simulates the movement of each ant in the Ant Colony Optimization algorithm, creating a unique path based on pheromone levels and distances between points. Here’s a detailed explanation:

1. **Initialize Parameters**:  
   - `num_points` holds the total number of data points in the dataset (`self.pvt_data`).
   - `ants_paths` is an empty list where all paths taken by each ant in a single iteration will be stored.

2. **Set Ant Start Points**:  
   For each ant (as determined by `self.num_ants`), the function randomly selects a starting point using `np.random.randint(num_points)`. The starting point is added to a list, `path`, which stores the sequence of points visited by the current ant.

3. **Track Visited Points**:  
   A `visited` set is created to ensure each point is only visited once per ant path. The starting point is added to this set.

4. **Construct the Ant’s Path**:  
   - In a loop, the ant continues moving to new points until all points have been visited (`len(visited) < num_points`).
   - For each new move, the `calculate_probabilities` function is called, which calculates the likelihood (or probability) of visiting each unvisited point based on pheromone levels and distance.
   - `np.random.choice(num_points, p=probs)` selects the next point based on these probabilities, directing the ant toward the next most optimal location.
   - This next point is added to the `path` list and recorded in `visited` to avoid revisiting it within the current path.

5. **Store and Return Paths**:  
   Once an ant completes its path, the path is appended to `ants_paths`. After all ants have finished, `ants_paths` is returned, containing a complete set of paths taken by all ants in this iteration.


In [3]:
def calculate_probabilities(self, current_point, visited):
    pheromone = self.pheromone_matrix[current_point]
    dist = self.distance_matrix[current_point]
    unvisited_mask = ~np.isin(np.arange(len(pheromone)), list(visited))
    
    row = (pheromone ** self.alpha) * (unvisited_mask * (1.0 / (dist + 1e-10)) ** self.beta)
    probabilities = row / np.sum(row)
    return probabilities


### `calculate_probabilities` Function

This function calculates the probabilities of moving to unvisited nodes based on pheromone levels and distances. 

#### Parameters:
- `current_point`: The index of the current node.
- `visited`: A list of indices representing the nodes that have already been visited.

#### Process:
1. **Retrieve Pheromone and Distance**:
   - It extracts the pheromone levels and distances associated with the `current_point` from their respective matrices.

2. **Identify Unvisited Nodes**:
   - A mask is created to identify which nodes have not been visited by using `np.isin` to filter out the visited nodes.

3. **Calculate Probability Row**:
   - The probabilities are calculated using the formula:
     ![image](../Image/import.PNG)

   - Here, `alpha` and `beta` are parameters that control the influence of pheromone and distance, respectively.

4. **Normalize Probabilities**:
   - The computed row is normalized by dividing by the sum of all values in the row to ensure that the probabilities sum to 1.

#### Return:
- The function returns the normalized probabilities for moving to each unvisited node.


In [21]:
    def update_pheromone(self, ants_paths):
        self.pheromone_matrix *= self.decay
        for path in ants_paths:
            for i in range(len(path) - 1):
                self.pheromone_matrix[path[i], path[i + 1]] += 1.0 / self.distance_matrix[path[i], path[i + 1]]


### Updating Pheromone Levels (`update_pheromone` function)

The `update_pheromone` function is responsible for modifying the pheromone levels on the paths based on the ants' movements and the distances traveled. Here's a detailed breakdown of how it works:

1. **Decay of Pheromone Levels**:  
   - The pheromone levels in the matrix are reduced by a decay factor. This simulates the natural evaporation of pheromones over time, making older paths less attractive for future ants. 

2. **Pheromone Update for Each Path**:  
   - The function iterates through each path taken by the ants. 
   - For each path, it examines each segment of the path, specifically the transitions from one point to the next.

3. **Increase Pheromone Levels**:  
   - The pheromone level on the edge between two consecutive points in the path is increased. This is done by the formula:

![A Comment Jpeg](../Image/imggg.PNG)

 - The amount added is inversely proportional to the distance between these two points. Shorter distances receive a greater pheromone boost, reinforcing paths that are considered more efficient.

4. **Overall Effect**:  
   - By updating the pheromone levels in this way, the algorithm effectively encourages future ants to follow the more successful paths while discouraging them from taking longer, less optimal routes. Over multiple iterations, this helps to converge towards the most efficient solution based on the pheromone trails established by previous ants.


In [23]:
def get_shortest_path(self, ants_paths):
    shortest_cost = np.inf
    shortest_path = None
    for path in ants_paths:
        path_cost = self.calculate_path_cost(path)
        if path_cost < shortest_cost:
            shortest_cost = path_cost
            shortest_path = path
    return shortest_path, shortest_cost

### Finding the Shortest Path (`get_shortest_path` function)

The `get_shortest_path` function is designed to identify the most efficient path discovered by the ants during their exploration. Here's a detailed explanation of how it operates:

1. **Initialization of Variables**:  
   - Two variables are initialized: `shortest_cost` is set to infinity (`np.inf`), representing the lowest cost found so far, and `shortest_path` is initialized to `None`, which will hold the best path once found.

2. **Iterate Through Ants' Paths**:  
   - The function loops through each path generated by the ants (`for path in ants_paths:`). 

3. **Calculate Path Cost**:  
   - For each path, the function calls `self.calculate_path_cost(path)` to compute the total cost associated with that path. This typically involves summing the distances or costs of traveling between each pair of points in the path.

4. **Check for Shortest Path**:  
   - The function compares the calculated `path_cost` with the current `shortest_cost`. If the `path_cost` is less than `shortest_cost`, it updates `shortest_cost` with the new lower cost and assigns `shortest_path` to the current `path`.

5. **Return the Results**:  
   - After examining all paths, the function returns the best path and its corresponding cost as a tuple: `(shortest_path, shortest_cost)`.

### Overall Effect:
- This function plays a crucial role in determining the most efficient route through the dataset, allowing the Ant Colony Optimization algorithm to focus on the most promising paths based on the costs calculated during the exploration phase.


In [25]:
    def calculate_path_cost(self, path):
        path_cost = sum(self.distance_matrix[path[i], path[i + 1]] for i in range(len(path) - 1))
        return path_cost

### Calculating Path Cost (`calculate_path_cost` function)

The `calculate_path_cost` function is responsible for determining the total cost associated with a specific path taken by the ants. Here’s a breakdown of how this function works:

1. **Initialization of Path Cost**:  
   - The function begins by initializing a variable called `path_cost`, which will accumulate the total cost of traveling along the specified path.

2. **Sum of Distances**:  
   - The core calculation involves summing the distances between consecutive points in the path. This is achieved using a generator expression:
     ```python
     sum(self.distance_matrix[path[i], path[i + 1]] for i in range(len(path) - 1))
     ```
   - Here, `path[i]` and `path[i + 1]` refer to two consecutive points in the path. The distance between these two points is retrieved from `self.distance_matrix`, which contains the precomputed distances between all pairs of points.

3. **Return the Total Cost**:  
   - After calculating the sum of the distances, the function returns the total `path_cost`.

### Overall Effect:
- By calculating the path cost in this way, the function helps to quantify the efficiency of the route taken by the ants. Lower costs indicate more efficient paths, which the Ant Colony Optimization algorithm will seek to reinforce through pheromone updates.
