# Revolutionizing petrochemical chemistry: bioethylene production in *E. coli*

In [1]:
from IPython.display import FileLink
from IPython.display import Markdown

## **1. Introduction**

### **1.1 Literature review of the compound**
##### **Aplication of the product**
Ethylene use falls into two main categories: 1) as a monomer, from which longer carbon chains are constructed, and 2) as a starting material for other two-carbon compounds. The first of these is the largest use of ethylene, consuming about one-half of the annual output. Polymerization of ethylene gives polyethylene, a polymer particularly used in the production of packaging films, wire coatings, and squeeze bottles [1]. 

Ethylene is the starting material for the preparation of a number of two-carbon compounds including ethanol (industrial alcohol), ethylene oxide (converted to ethylene glycol for antifreeze and polyester fibres and films), acetaldehyde (converted to acetic acid), and vinyl chloride (converted to polyvinyl chloride). In addition to these compounds, ethylene and benzene combine to form ethylbenzene, which is dehydrogenated to styrene for use in the production of plastics and synthetic rubber [1]. 

While most commercially produced ethylene is used as a feedstock in the production of polymers and industrial chemicals, a relatively small amount is used for the controlled ripening of citrus fruits, tomatoes, bananas and many other fruits, vegetables and flowers. Endogenous production of ethylene in plant tissue generally increases rapidly during ripening. Application of ethylene to plants before the time of this natural increase not only initiates the ripening process but also increases endogenous ethylene production. Ethylene has commonly been used in this way since the early part of this century [2].

##### **Evaluation of market potential**
Global ethylene market size was USD 101.1 billion in 2020 and it is expected to grow at a CAGR (compound annual growth rate) of 5.5% until 2029 [3]. The main growth driver is the growing working population in the world, which has created a need for packaged meals and refreshments [4]. Another growth driver is the expanding automotive sector, where PEs are often used for vehicle exteriors, electrical insultation and gasoline tanks [3]. The depletion of raw material such as fossil fuel resources along with fluctuating crude oil prices can act as a restraining factor for the market growth. The key players in the industry are among others: China Petroleum & Chemical (Beijing, China), Exxon Mobil Corporation (Texas, US), and Shell International (The Hague, Netherlands) [5]. The price for ethylene is currently at 0.56$/kg.

##### **Biosynthetic pathway/gene**
There are several ways to produce ethylene from *Escherichia coli (E. coli)*, nevertheless we focused this project in the introduction of the protein Ethylene Forming Enzyme (EFE), responsible for ethylene production from a compound the bacterium naturally produced, 2-oxoglutarate and L-arginine as substrates [6].
EFE belongs to the family of mononuclear non-heme Fe(II)- and 2-oxoglutarate-dependent oxygenases, which represent a large family of enzymes that use dioxygen and 2-oxoglutarate as co-substrates to perform a variety of oxidative proceses [7]. However, EFE can not only catalyze the decomposition of 2-oxoglutarate to ethylene and three CO2 molecules, but also catalyze C5 hydroxylation of L-arginine coupled to the oxidative decarboxylation of 2-oxoglutarate leading to succinate, CO2 and L-hydroxyarginine (which can further transform to guanidine and L-D-1-pyrroline-5-carboxylate). Therefore, EFE can catalyze both the ethylene formation and the hydroxylation of arginine, in a ratio of 2:1 respectively, as it can be shown is Figure 1:

![image.png](attachment:image.png)

**Figure 1**. Two reactions catalyzed by EFE. (1) Represents the ethylene formation and (2) represents the hydroxylation of L-arginine [7].






### **1.2 Literature review of the cell factory**
##### **General advantages**
Enormous genomic sequences of *E. coli* have been disclosed for new metbolic reactions and are available in public databases [8]. Furthermore, this microorganism has been extensively used in both basic molecular biology and biotechnology for the last 60 years. All of it leads to *E. coli* having clearly defined physiological and genetic characteristics, facilitating the task of genetically modifying it [9]. 

*E. coli* is considered the “workhorse” of molecular biology due to its high growth-rate in complex and chemically defined culture media, being able to use a wide range of substrates [10]. Moreover, *E. coli* grows fast in minimal salts medium, having high growth and metabolic rates [9]. It should also be noted that using various carbon sources in defined salt medium, *E. coli* can have both aerobic and anaerobic growth [8].

##### **General disadvantages**
One of the most important downsides of the using of *E. coli* as a cell factory is the fact that its outer membrane, like that of most Gram-negative bacteria, contains lipopolysaccharide (LPS), which are known as endotoxins. In mammalian hosts, LPS can induce a pyrogenic response and ultimately trigger septic shock [11]. 

On the other hand, you cannot use *E. coli* as a cell factory if you want to produce eukaryotic proteins which require post translational modifications, as *E. coli* lacks the cellular machinary necessary to performm certain post-translational modifications like N- and O- linked glycosylation, hydroxylation, amidation, sulfation or palmitation [12]. 

Furthermore, *E. coli* has a Limited Protein Secretion Capability due to the fact that the secretion machinery of *E. coli* has a limited capacity and can become overloaded [13].

Finally, it is important to take into account that *E. coli* is sensitive to organic solvents, which are very used in bioprocessing [14].

##### **Suitability of the cell factory for the product**
The success of using microorganism for industrial production of fuels depends on its ability to quickly convert renewable raw material into fuel with high productivity at a low price without being toxic to the organism itself. Availability of genetic and molecular tool to engineer existing native pathwats or to create a synthetic new pathway has made *E. coli* as the microorganism of best choice in order to produce biofuels from renewable energy sources [15]. 

*E. coli* is considered a suitable cell factory for the production of bioethylene because of the upsides stated above, among which we can highlight the fact that this microorganism is a well-studied bacteria with known metabolic pathways [8], so it can be easily engineered to produce bioethylene via a single-enzyme conversion of common metabolites [6]. Furthermore, scientists have sucessfully engineereed *E. coli* to produce bioethylene by introducing genes from other organisms that are involved in the natural ethylene biosynthesis pathway [16]. During this process rapid growth and reproduction rates of *E. coli* have been obtained under controlled conditions, making it a potentially efficient host for large-scale production [16].

## 2. Problem definition

In the modern era, ethylene serves as the fundamental building block for the production of resins and plastics. Traditional methods for generating ethylene involve the steam cracking of ethane in chemical plants, which requires extreme temperatures and pressures. This process consumes a significant amount of energy and results in the release of substantial carbon dioxide emissions. As a result, there is an urgent need for alternative conversion technologies to enhance the efficiency of chemical manufacturing while mitigating the effects of global warming [17].

To address this need, there is growing interest in developing technology for ethylene production from renewable resources such as CO2 and biomass. Ethylene is naturally produced by plants and certain plant-associated microbes. Microbes employ various metabolic pathways for ethylene production, one of which involves an ethylene-forming enzyme (EFE) that utilizes α-ketoglutarate and arginine as substrates. EFE presents a promising biotechnology target because the expression of a single gene is adequate for ethylene production without generating toxic intermediates [18].

In pursuit of this goal, our strategy involves incorporating the EFE enzyme and the reactions depicted in Figure 1 into an *E. coli* strain, which will serve as our genomic-scale model (GSM). Therefore, our objective is to engineer an *E. coli* cell factory capable of producing ethylene with a high yield, thus offering a sustainable alternative to the conventional chemical production of ethylene.


## 3. Selection and assessment of existing GSM

Plenty of GSM models for Escherichia coli are available in the literature, presenting the challenge to choose the suitable model for our project. In order to compare and choose from available models a deep search for E. coli model in BiGG models and BioModels was performed. Models with as many reactions, metabolites and genes as possible were the parameters of the first research, meanwhile a cross search was performed in the literature to compare BiGG values with experimental data of the models and find a paper connected to our aim. The most cited E. coli model is iJO1366 [19] used to build the web interacting application Escher; the model (last updated Oct 31, 2019) contains 1.805 metabolites, 2.583 reactions and 1.367 genes and is constantly implemented with new data. The other model that we mainly considered was the iML1515 that represents an implementation of the previous model due the addition of 72 metabolites, 129 reactions and 149 genes; this model however implement specific pathways as it is considered a specialized model. To further proceed with the selection and have a clear idea of which is the most suitable model for our aim we proceeded by scoring the two models and some of the best models found in BiGG (iECW_1372, iDK1463, iEC1368_DH5a, iECO26_1355, and many more)[20]. Memote was performed to score all the models using: memote report snapshot --filename "report_name.html" path/to/model.xml. After comparing the reports, we selected iJO1366 and iML1515 by the consistency of the model and annotations (metabolites, reactions and genes).  

Diff memote was performed on the two models and the results obtained are shown in Table 1:



**Table 1**. Characteristics of the iJO1366 and iML1515 *E. coli* model.

| *E- coli* model | Total score (%)* | Total metabolites | Total reactions | Total genes | Metabolic coverage | Consistency score (%)* | Annotation-metabolites score (%) | Annotation-reactions score (%) |
| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
| iJO1366 | 91 | 1805 | 2583 | 1367 | 1.89 | 98.18 | 79 | 81 |
| iML1515 | 91 | 1877 | 2712 | 1516 | 1.79 | 98.40 | 79 | 81 |

###### *Values took from the memote report from each GSM because Unbounded Flux In Default Medium in the diff memote report has a value of 0% unlike single reports. 

In both models, OptFlux was used to assess the presence of necessary metabolites for the future implementation of the pathway (2-oxoglutarate and L-arginine), revealing their presence in both models. 

iJO1366 was shown to be the best model to improve ethylene production in *E. coli* [21] as it is also supported by a vast collection of experimental data used to gap-fill the model [22]. As shown by the information collected and memote report iJO1366 represents a suitable model to implement ethylene pathway despite the absence of recent reports on implementation possibilities. We inferred that all the possible implements for this model are represented in the latest version of 2019 published in BiGG so we proceed to work with it. 

In [5]:
FileLink('memote_tests/index_iJO1366.html')

In [6]:
FileLink('memote_tests/index_iML1515.html')

In [7]:
FileLink('memote_tests/compare_iML1515_iJO1366.html')

## 4. Computer-Aided Cell Factory Engineering

### 4.1. Gene Knock-out strategies

Studying the effects of gene knockouts in a cell factory computationally provides a powerful and efficient means to design and optimize microbial strains for industrial production. It could complement experimental work by guiding the selection of genetic modifications and offering valuable insights into the complex interactions within cellular metabolism [23].

Knocking out some genes could improve the yield of our target product by eliminating competing pathways or redirecting metabolic flux as well as helping to reduce the formation of undesirable byproducts.
For this reason, we studied which genes could be knocked out in order to improve the production of ethylene in our strain in two different ways: using two functions of cameo library and doing manually knock outs of genes of interest.

**Cameo Knock out strategies**

We used two different functions: OptGene and OptKnock. 
OptGene is an approach to search for gene or reaction knockouts that relies on evolutionary algorithms. At every iteration, we keep the best 50 individuals found overall so we can generate a library of targets. In this algorithm reaction knockouts are implemented randomly, creating a mutant population.
OptKnock is one of the tools that was first developed to try to implement an optimal knockout framework where the idea is how the fluxes are optimally distributed when there is a gene knockout. The result is a list of knockouts, that when executed in silico, result in a strain where product synthesis occurs at maximum growth [24].

Nevertheless, for our target reaction we didn’t achieve any gene or reactions that could improve the ethylene production. 


In [3]:
FileLink('analysis/Knock_out.ipynb')

**Manually derived Knock-out strains**

All the genes involved in manually derived knock-outs were taken  from  [25] and can be shown in Figure 2. After assessing the presence of these genes in our model we then knock-out the connected reactions of the genes (otherwise knocking out the genes will not affect the model). EFE reaction production and yield were then calculated before and after each knock-out as well as biomass production. The result for all the single knock-outs was not in any implementation of ethylene production and yield.

![image.png](attachment:image.png)

**Figure 2:** Pathway knock-out representation.

**Table 2**. Representation of biomass productivity and productivity of EFE reaction of the manually knock-outs carried out.

| KO name | Gene | Reaction involved | Maximum theoretical biomass productivity (1/h)​ | Maximum theoretical productivity of EFE reaction (mmol/gDW*h)​ |
| :-: | :-: | :-: | :-: | :-: |
| No KO | - | - | 0.9823718127269851 | 12.780573951434883 |
| Strain_1_arginine | b2938 | RGDCpp | 0.9823718127269903 | 12.78057395143489 |
| Strain_2_gdhA | b1761 | GLUDy | 0.9487730068947992 | 12.614889336016203 |
| Strain_3_gltBD | b3212 | GLUSy | 0.9823718127269903 | 12.78057395143489 |
| Strain_4_sucA | b0726 | AKGDH | 0.9823718127269851 | 12.780573951434883 |
| Strain_KO | b2938 / b1761 / b3212 / b0726 | RGDCpp / GLUDy / GLUSy / AKGDH | 1.8103630703219057e-15 | 12.515189873417665 |



We then decided to create a strain with all the four knock-outs previously implemented and the result was in the drastically decrease of biomass production, probably causing the death of the cell, and a slightly decrease in ethylene production also.

In [5]:
FileLink('analysis/Manually_derived_KO/Strain_1_arginine.ipynb')

In [6]:
FileLink('analysis/Manually_derived_KO/Strain_2_gdhA.ipynb')

In [7]:
FileLink('analysis/Manually_derived_KO/Strain_3_gltBD.ipynb')

In [8]:
FileLink('analysis/Manually_derived_KO/Strain_4_sucA.ipynb')

In [9]:
FileLink('analysis/Manually_derived_KO/Strain_KO.ipynb')

### 4.2. Over-expression and down-regulation strategy

Overexpressing essential genes in metabolic pathways is a commonly employed method to enhance the production of a desired molecule. As the increased production of naturally synthesized compounds typically involves genetic modification to metabolic pathways, a crucial initial step is to identify the critical pathways and specific genes to guide strategies for overexpressing or downregulating genes in subsequent manipulations [26].

Therefore, we use the Flux Scanning based on Enforced Objective Flux (FSEOF) tool of cameo to identify which genes could potentially be upregulated or downregulated in order to improve our target reaction. FSEOF scans all the metabolic fluxes in the metabolic model and selects fluxes that change when the flux toward product formation is enforced (gradually increased) as an additional constraint during flux analysis [27].

Using FSEOF tool we obtained 90 genes which could be upregulated or downregulated in order to increase the production of ethylene. After filter all the data obtained, we ended up with a list of 3 genes that could be downregulated and 6 genes that could be upregulated. 

**Genes for downregulation**:

The potential genes to downregulate are represented in Table 3:

**Table 3.** Potential genes to downregulate obtained with FSEOF tool.

| Enzyme ID| Enzyme name |
| :-: | :-: |
| ASAD | Aspartate-semialdehyde dehydrogenase |
| HSDy | Homoserine dehydrogenase (NADPH) |
| PUNP1| Purine-nucleoside phosphorylase (Adenosine) |

For PUNP1 we did not find any metabolic connection with the production of ethylene. Nevertheless, for both ASAD and HSDy reactions, the downregulation of them would lead into the increase in the production of aspartate and therefore of 2-oxoglutarate (Figure 3), the substrate of the ethylene reaction.
![image.png](attachment:image.png)
**Figure 3**. Representation of the Arginine biosynthesis for our host. We can see how the increase in the aspartate will lead into the production of 2-oxoglutarate [28].  

The **Aspartate-semialdehyde dehydrogenase (ASAD)** catalyses the reductive dephosphorylation of aspartate semialdehyde to aspartyl phosphate (Figure 4). The downregulation of the ASAD enzyme will lead into the increase of aspartate semialdehyde, and hence of aspartate. The increase of aspartate would produce an increase in oxaloacetate (aspartate aminotransferase), producing an overproduction of 2-oxoglutarate, the substrate of our EFE reaction for producing ethylene.

![image.png](attachment:image.png)

**Figure 4**. Conversion of aspartyl phosphate to aspartyl semialdehyde catalyzed by ASADH [29].

**Homoserine dehydrogenase (HSDy)** catalyzes the reaction from aspartate-dehydrogenase to homoserine (Figure 5). Again, the downregulation of HSDy will lead to an increase in aspartate and therefore, in 2-oxoglutarate.

![image.png](attachment:image.png)
**Figure 5**. Reversible conversion between L-aspartate semialdehyde and L-homoserine catalyzed by HSDy [30].

**Genes for overexpression**:

The potential genes to upregulate are represented in Table 4:

**Table 4.** Potential genes to upregulate obtained with FSEOF tool.

| Enzyme ID | Enzyme name |
| :-: | :-: |
| ALATA_L | L-alanine transaminase |
| IPDDI | Isopentenyl-diphosphate D-isomerase |
| MEOHtex | Methanol transport via diffusion (extracellular to periplasm) |
| MEOHtrpp| Methanol reversible transport via diffusion (periplasm) |
| PPK | Polyphosphate kinase |
| THRt2rpp | L-threonine reversible transport via proton symport (periplasm) |

In [12]:
FileLink('analysis/Gene_prediction_targets.ipynb')

Among all these enzymes, we could highlight the **L-alanine transaminase (ALATA_L)**, which catalyzes the transfer of an amino group from alanine to 2-oxoglutarate (Figure 6). Thus, overexpressing this enzyme could lead into an increase in the substrate of our target reaction.

![image.png](attachment:image.png)

**Figure 6**. Action of the ALATA_L enzyme [31].


### 4.3. Media optimization strategy

In this part, we try to increase the growth-rate and productivity of the strain by ‘conventional’ means that does not involve editing the genome. It is well known that the composition of the media and the parameters at which the fermentation is run (pH, temperature and oxygen supply) can greatly influence the growth-rate and productivity of a strain. Unfortunately, we cannot simulate the process parameters within the model. However, we can experiment with the media composition to increase the growth rate and yield. 

When taking a look at the baseline media that is defined by the model, we see that it provides Glucose, Oxygen, Nitrogen (in form of Ammonia) and a lot of other trace elements. With this baseline media the model has a growth rate of [0.98 /h] and a EFE_m productivity of [12.8 mmol/gDW*h]. This will be the benchmark to which every media change is compared to. 
In order to screen which Amino acids are most important for the model to (1) grow and (2) run the EFE_m reaction, a Fractional Factorial Design (2^(21-16)) was set up using JMP. The 21 Factors tested at 2 levels were Glucose as well as the 20 common Amino acids. This design resulted in 32 unique runs (confounding of main effects with two-factor interactions is horrible, but it’s the best we have to test so many factors at once). The design was neither replicated, randomized, nor blocked, because the model is inherently deterministic. The values are calculated and will not change when repeating, resulting in no variance. 

After analyzing the data we found that out of the 20 amino acids tested, 11 had a big impact on the growth rate and EFE_m productivity. They were the following: Ala, Arg, Asn, Asp, Cys, Gln, Glu, Pro, Ser, Thr, Trp. The growth rate with this media was [17.3 /h] and the EFE_m productivity was [225.4 mmol/gDW*h], which is roughly a 18-fold increase (compared to the benchmark) respectively.  

Furthermore, the crucial Amino acids were tested with (1) Sucrose and (2) Fructose instead of Glucose. This revealed that the use of Sucrose increased the growth rate and productivity to [19.2 /h] and [277 mmol/gDW*h].  

When taking a look at the theoretical maximum yield with the baseline media, we can see that ~ 1.3 mmol of Ethylene could be produced from 1 mmol of Glucose. This can be improved further by changing the media to the (1) Glucose optimized and (2) Sucrose optimized, which reveals a theoretical maximum yield of (1)  5.6 mmol_eth/mmol_glc and (2) 6.9 mmol_eth/mmol_sucr as well as a cmol yield of (1) 1.9 cmol_eth/cmol_glc and (2) 1.2 cmol_eth/cmol_sucr. These results are shown in Table 5:

**Table 5:** Shows comparison of Baseline Media, Best Reduced Glucose Media and Best Reduced Sucrose Media.

|  | Baseline Media | Best Reduced Glucose Media | Best Reduced Sucrose Media |
| :-: | :-: | :-: | :-: |
| Growth-rate [1/h] | 0.98 | 17.3 | 19.2 |
| EFE_m Productivity [mmol/gDW*h] | 12.8 | 225.4 | 277 |
| Yield [mmol_eth/mmol_Csource] | 1.3 | 5.6 | 6.9 |
| Cmol Yield [cmol_eth/cmol_Csource] | 0.4 | 1.9 | 1.2 |

In [6]:
FileLink('models/media_optimization.ipynb')

In [7]:
FileLink('models/maximum_yield.ipynb')

### 4.4. Cofactor swap strategy

Even though our main reaction (EFE reaction) does not use NADH or NADPH as cofactor, we wanted to analyse if a possible cofactor swap in the reactions that lead to our precursors would result in an increase in the ethylene production. We obtained that a cofactor swap in the GADP, ACALD and GLUDy reactions, among others, would improve significantly the ethylene yield. These reactions are catalysed by enzymes that produce glyceraldehyde 3-phosphate, acetyl-CoA and 2-Oxoglutarate respectively, all of them precursors for the ethylene. Even more, 2-Oxoglutarate is the substrate for the EFE main reaction that leads to the obtention of ethylene (Figure 1). Without any evidence in the literature, we can nevertheless say that a possible cofactor swap in these reactions, focused in the GLUDy, could be further studied with the aim of achieving a higher ethylene production. 

**Table 6.** Reactions for cofactor swap ordered by fitness improvement.

| Targets | Fitness |
| :-: | :-: |
| GADP | 0.432096 |
| ACALD | 0.429814 |
| GLUDy | 0.429203 |


In [3]:
FileLink('analysis/Cofactor swap targets.ipynb')

### 4.5. Heterologous pathways

To computationally enumerate all the possible potential heterologous pathways for ethylene production we analized our model with the cameo function: predictor = pathway_prediction.PathwayPredictor(model),
pathways = predictor.run(product="eth_c", max_predictions=4) [32].The output of our analysis couldn't produce any result due to the never ending run of the code script. Trying to avoid this problem, we also tried to use another function in cameo: report = api.design(product='2-oxoglutarate', view=mp_view) [33], that uses standard models to find potential other pathways. In the end we weren't able to implement the python code to perform the cameo analysis with our implemented model. Even trying to run the code for the production of 2-oxoglutarate that is in iJO1366 model, and increase the timeout, no pathway were found.


In [8]:
FileLink('analysis/Heterologous_pathways/Hetero_pathways.ipynb')

In [9]:
FileLink('analysis/Heterologous_pathways/hetero_path2.ipynb')

### 4.6. FBA and dFBA strategy

 

Flux balance analysis (FBA) and dynamic FBA were performed. FBA showed that the implementation of the reactions were successful. We see a ~ 70% flux usage of alpha-ketoglutarate for our EFE_m reaction (shown below). 

In [21]:
from cobra.io import read_sbml_model
from cobra import Reaction, Metabolite
model = read_sbml_model('models/modified_model.xml')
model.objective = model.reactions.EFE_m
model.metabolites.akg_c.summary()

Percent,Flux,Reaction,Definition
28.71%,5.147,ASPTA,akg_c + asp__L_c <=> glu__L_c + oaa_c
71.29%,12.78,ICDHyr,icit_c + nadp_c <=> akg_c + co2_c + nadph_c

Percent,Flux,Reaction,Definition
71.29%,-12.78,EFE_m,akg_c + 0.5 o2_c --> 3.0 co2_c + eth_c
28.71%,-5.147,GLUDy,glu__L_c + h2o_c + nadp_c <=> akg_c + h_c + nadph_c + nh4_c


The dFBA also worked up to a certain point as can be seen in Figure 7. However, we were not able to (1) run the simulation until the depletion of Glucose, because prolonged calculations lead to an infeasable event and (2) simulate the production of Ethylene, because we were missing kinetic constants related to the EFE_m reaction. We also tried running a dFBA with the dfba package - even though we downgraded python and read through the documentation of the package - we failed.

![dFBA](currently_best_dFBA.png)

**Figure 7.** Shows dFBA with Biomass and Glucose over 4 timeunits.

In [8]:
FileLink('models/dFBA.ipynb')

### 4.7. Other strategies
#### Phenotype phase plane analysis

With this strategy we decided to analyse how our cell factory responds to changes in the Glc intake (the principal carbon source) or the oxygen availability. We analysed these effects in the biomass and ethylene production. The results obtained, as expected, show that the cell growth is limited by Glc and oxygen (it is an aerobic microorganism) and the ethylene production is also limited by the Glc available (carbon source for the precursors needed for its synthesis) and the oxygen (the EFE reaction uses O2 as cofactor).

#### Escher 

We tried to show an Escher map with our modified model that also included the fluxes. However, we failed in producing a json file and in running Escher code. 

In [4]:
FileLink('analysis/Phenotype_phase_plane_analysis.ipynb')

As a summary, our main strategies for the improving the ethylene production would be: media optimization, upregulation and downregulation of genes, and targeted cofactor swap, as they have been predicted to be the only strategies capable of improve the ethylene yield in our model.

## 5. Discussion

As we described above, we followed six different strategies to try to overproduce ethylene: heterologous pathways, knock out, media optimization, gene prediction target and cofactor swap.
Regarding heterologous pathway, although 2-oxoglutarate is involved in other pathways apart from the TCA cycle we were not able to computationally enumerate all the possible pathways that could lead us to the production of ethylene from a different substrate than glucose due to the code.
Although we used the OptGene and OptKnock functions of cameo to try to look for some knock outs that improve the production of ethylene in our host, we ended up with no results of genes or reactions whose knock out could improve this yield. Moreover, all the strains generated from the manually knocked-out reactions were experimentally proven to increase the ethylene production, especially the GLDy reaction, the second reaction per consumption of 2-oxoglutarate in *E. coli* metabolism. Despite the experimental data, our result could lead us to infer model implementation is needed to assess knock-outs that involves the core metabolism of our model or the knock outs somehow balance themselves because also the biomass production was slightly or not affected.
Regarding the media optimization, we can conclude that it worked very well, as we were able to achieve a 22 fold improvement of the EFE_m productivity. The 32 runs were capable of showing the important amino acids for biomass and EFE_m productivity. Some of the important amino acids are directly connected to the TCA cycle, therefore adding them to the media could lead to more alpha-ketoglutarate and therefore more substrate for EFE_m. Other amino acids are produced directly from alpha-ketoglutarate, therefore adding them will enable more alpha-ketoglutarate to be used for EFE_m. However, some of the amino acids are not directly connected to the TCA cycle, which is why we are unsure what causes their EFE_m improvement.
Furthermore, we found out that there were 3 enzymes we could downregulate and 6 we could overexpress in order to enforce significantly the ethylene production. Among these enzymes, we could highlight the downregulation of ASAD and HSDy so as to increase the production of aspartate and therefore of 2-oxoglutarate, the substrate for the ethylene production. Furthermore, the overexpression of ALATA_L will increase the transformation of alanine to 2-oxoglutarate, increasing as well the production of ethylene. Nevertheless, when we tried to simulate the downregulation of these genes changing their lower and upper boundry, we did not obtain an increase in the ethylene yield.
As for the cofactor swap analysis performed, we can conclude that a change in the cofactor of the reaction leading to the substrate of the EFE main reaction, the 2-oxoglutarate, would improve the ethylene yield, even though we cannot support this result with the literature, so further investigation regarding this matter should be done.
This project resembles the 'Design' part of the 'Design-Build-Test-Learn' cycle. The next step would be to build an *E. coli* strain with the added reactions in the laboratory and test whether the in-vivo yield matches with the in-silica yield.

## 6. Conclusion

All in all, we are happy to report that we were able to successfully implement the reactions related to the productions of Ethylene into our model. This was possible while maintaining mass and charge balance. Additionally, we were also able to improve the production of Ethylene from a baseline productivity of ~ 12.8 mmol/gDW*h to ~277 mmol/gDW*h. This was purely due to media optimization. On one side, it is good to see such a big improvement with an easy to implement solution. On the other side it was a bit disappointing to see improvements through gene regulation/knockouts. Improvements through knockouts is something that requires more attention and research in the future. 

Even though we were able to improve the Ethylene productivity, back of the envelope calculations of the cost of 1 tonne of Ethylene showed that we are far from economical feasability. A more sustainable production of Ethylene in *E. coli* would only become economically feasable if the productivity is increased drastically or if the cost of petrochemically produced Ethylene rises. 

In [23]:
FileLink('analysis/economical_feasability.ipynb')

## References
[1] International Agency for Research on Cancer. (1994). Ethylene. Some Industrial Chemicals - NCBI Bookshelf. https://www.ncbi.nlm.nih.gov/books/NBK507450/

[2] Carey, F. A. (2023, November 30). Ethylene | Structure, Sources, Production, uses, & Facts. Encyclopedia Britannica. https://www.britannica.com/science/ethylene

[3] Ethylene Market Size, Share, Growth Rate, Forecast, Report, 2032. (n.d.). https://www.fortunebusinessinsights.com/ethylene-market-104532

[4] Ethylene market research report, share & forecast by 2023 - 2032. (n.d.). Polaris. https://www.polarismarketresearch.com/industry-analysis/ethylene-market

[5] Mike. (2023, September 17). Ethylene price index. Businessanalytiq. https://businessanalytiq.com/procurementanalytics/index/ethylene-price-index/

[6] https://core.ac.uk/download/pdf/147243345.pdf

[7] Xue, J., Lu, J., & Lai, W. (2019). Mechanistic insights into a non-heme 2-oxoglutarate-dependent ethylene-forming enzyme: selectivity of ethylene-formation versusl-Arg hydroxylation. Physical Chemistry Chemical Physics, 21(19), 9957–9968. https://doi.org/10.1039/c9cp00794f

[8] Wang, C., Pfleger, B. F., & Kim, S. (2017). Reassessing Escherichia coli as a cell factory for biofuel production. Current Opinion in Biotechnology, 45, 92–103. https://doi.org/10.1016/j.copbio.2017.02.010

[9] Li, P., Zhu, X., Tan, Z., Zhang, X., & Ma, Y. (2015). Construction of Escherichia coli cell factories for production of organic acids and alcohols. In Advances in Biochemical Engineering / Biotechnology (pp. 107–140). https://doi.org/10.1007/10_2014_294

[10] Valle, A., & Bolı́Var, J. (2021). Escherichia coli, the workhorse cell factory for the production of chemicals. In Elsevier eBooks (pp. 115–137). https://doi.org/10.1016/b978-0-12-821477-0.00012-x

[11] Mamat, U., Wilke, K., Bramhill, D., Schromm, A. B., Lindner, B., Kohl, T. A., Corchero, J. L., Villaverde, A., Schaffer, L., Head, S. R., Souvignier, C., Meredith, T. C., & Woodard, R. W. (2015). Detoxifying Escherichia coli for endotoxin-free production of recombinant proteins. Microbial Cell Factories, 14(1). https://doi.org/10.1186/s12934-015-0241-5

[12] Man, P. (n.d.). Post Translational Modifications: what expression system to choose? https://info.gbiosciences.com/blog/post-translational-modifications-what-expression-system-to-choose

[13] Horga, L. G., Halliwell, S., Castiñeiras, T. S., Wyre, C., Matos, C. F., Yovcheva, D. S., Kent, R., Morra, R., Williams, S. G., Smith, D. C., & Dixon, N. (2018). Tuning recombinant protein expression to match secretion capacity. Microbial Cell Factories, 17(1). https://doi.org/10.1186/s12934-018-1047-z

[14] Ikehata, Y., & Doukyu, N. (2022). Improving the organic solvent tolerance of Escherichia coli with vanillin, and the involvement of an AcrAB-TolC efflux pump in vanillin tolerance. Journal of Bioscience and Bioengineering, 133(4), 347–352. https://doi.org/10.1016/j.jbiosc.2021.12.015

[15] Koppolu, V., & Vasigala, V. K. (2016). Role of Escherichia coli in Biofuel Production. Microbiology Insights, 9, MBI.S10878. https://doi.org/10.4137/mbi.s10878

[16] Lynch, S., Eckert, C. A., Yu, J., Gill, R. T., & Maness, P. (2016). Overcoming substrate limitations for improved production of ethylene in E. coli. Biotechnology for Biofuels, 9(1). https://doi.org/10.1186/s13068-015-0413-x

[17] Producing ethylene through a more environmentally safe process. (n.d.). College of Engineering and Computing. https://sc.edu/study/colleges_schools/engineering_and_computing/news_events/news/2021/producing_ethylene_environmentally_safe_process.php#:~:text=The%20conventional%20process%20to%20produce,carbon%20dioxide%2C%20a%20greenhouse%20gas

[18] Eckert, C. A., Xu, W., Xiong, W., Lynch, S., Ungerer, J., Tao, L., Gill, R. T., Maness, P., & Yu, J. (2014). Ethylene-forming enzyme and bioethylene production. Biotechnology for Biofuels, 7(1). https://doi.org/10.1186/1754-6834-7-33

[19] Orth, J. D., Conrad, T. M., Na, J., Lerman, J. A., Nam, H., Feist, A. M., & Palsson, B. Ø. (2011). A comprehensive genome-scale reconstruction of Escherichia coli metabolism--2011. Molecular systems biology, 7, 535. https://doi.org/10.1038/msb.2011.65 

[20] http://bigg.ucsd.edu/  

[21] Erickson, K. E., Gill, R. T., & Chatterjee, A. (2014). CONSTRICTOR: constraint modification provides insight into design of biochemical networks. PloS one, 9(11), e113820. https://doi.org/10.1371/journal.pone.0113820  

[22] Orth, J. D., & Palsson, B. (2012). Gap-filling analysis of the iJO1366 Escherichia coli metabolic network reconstruction for discovery of metabolic functions. BMC systems biology, 6, 30. https://doi.org/10.1186/1752-0509-6-30 

[23] Patil, K. R., Rocha, I., Förster, J., & Nielsen, J. (2005). Evolutionary programming as a platform for in silico metabolic engineering. BMC Bioinformatics, 6(1). https://doi.org/10.1186/1471-2105-6-308 

[24] Burgard, A. P., Pharkya, P., & Maranas, C. D. (2003). Optknock: A bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnology and Bioengineering, 84(6), 647–657. https://doi.org/10.1002/bit.10803 

[25] Lynch, S., Eckert, C., Yu, J. et al. Overcoming substrate limitations for improved production of ethylene in E. coli . Biotechnol Biofuels 9, 3 (2016). 
https://doi.org/10.1186/s13068-015-0413-x  

[26] Wang, X., Yu, L., & Chen, S. UP Finder: A COBRA toolbox extension for identifying gene overexpression strategies for targeted overproduction. Metabolic Engineering Communications, 5, 54–59. (2017). https://doi.org/10.1016/j.meteno.2017.08.001 

[27] Predict expression modulation targets — cameo 0.9.0b1+11.gb21c40b.dirty documentation.(n.d.). https://cameo.readthedocs.io/en/latest/06-predict-gene-modulation-targets.html?highlight=fseof 

[28]https://www.kegg.jp/pathway/eco00220  

[29] Teakel, S. L., Fairman, J., Muruthi, M. M., Abendroth, J., Dranow, D. M., Lorimer, D., Myler, P. J., Edwards, T. E., & Forwood, J. K. (2022). Structural characterization of aspartate-semialdehyde dehydrogenase from Pseudomonas aeruginosa and Neisseria gonorrhoeae. Scientific Reports, 12(1). https://doi.org/10.1038/s41598-022-17384-9 

[30] Kim, D. H., Nguyen, Q. T., Ko, G. S., & Yang, J. K. (2020). Molecular and Enzymatic Features of Homoserine Dehydrogenase from Bacillus subtilis. Journal of Microbiology and Biotechnology, 30(12), 1905–1911. https://doi.org/10.4014/jmb.2004.04060 

[31]Kendziorek, M., Paszkowski, A., & Zagdańska, B. (2012). Biochemical characterization and kinetic properties of alanine aminotransferase homologues partially purified from wheat (Triticum aestivum L.). Phytochemistry, 82, 7–14. https://doi.org/10.1016/j.phytochem.2012.07.008 

[32]https://cameo.bio/07-predict-heterologous-pathways.html  

[33]https://cameo.bio/08-high-level-API.html  

 