# Background

## Introduction

Turbulent flow is a flow regime characterized by chaotic property changes. This includes rapid variation of pressure and flow velocity in space and time.  In contrast to laminar flow the fluid no longer travels in layers and mixing across the tube is highly efficient 

Turbulent regime is characterized by large values of the Reynolds number:

$$ 
\begin{equation}
Re = \frac{V R}{\nu} 
\end{equation}
$$

$V$ = characcteristic flow velocity
$R$ = radius of the cross-section of the pipe
$\nu$ = kinematic viscosity of the fluid

Flows at Reynolds numbers larger than 4000 are typically (but not necessarily) turbulent, while those at low Reynolds numbers below 2300 usually remain laminar. Flow in the range of Reynolds numbers 2300 to 4000 and known as transition.

At the centre of the pipe, the velocity profile in the central section of the pipe appears flatter for turbulent flows when compared to the profiles for laminar flows.Diffusivity of the turbulent flow makes thr flow velocity drop rapidly close to the walls of the pipe.

To describe the turbuluent pipe flow, we employ the power-law velocity profile function represented by the equation below:
$$
\begin{equation}
u(r) = U_{max}\left(1-\frac{r}{R}\right)^{\frac{1}{n}}
\end{equation}
$$

For this project we aim to apply a regression and interpolation techniques,using the power-law velocity profile function and provided real-world flow data to predict the flow velocity profile for unobserved flow conditions.

# Dataset Description

The dataset provided is made up of 6 sub datasets in .txt format, each contanining the following flow data:
1. Re_tau, the Reynolds number (dimensionless) of the flow.
2. U_tau, the friction velocity of the flow in $m/s$.
3. nu, the kinematic viscosity of the fluid in $m^2/s$.
4. R, the radius of the pipe in $m$.
5. $N$ entries of position $y$ in $m$ and coreesponding measurements of velocity $U$ at that position in $m/s$.

The Reynolds numbers of the flow considered for this project are [5522, 9906, 21337, 31793, 35061, 39855]

Dataset with Re_tau = 35061 is designated as the validation dataset, so it will be left out from further preprocessing and preparation.

A brief overview of the other datasets is given below:



### Re =5522

Dataset with $N$ = 25 entries with values of $U(m/s)$ ranging from 1.26065 to 5.25481

### Re= 9906

Dataset with $N$ = 31 entries with values of $U(m/s)$ ranging from 2.45238 to 9.80688

### Re=21337

Dataset with $N$ = 35 entries with values of $U(m/s)$ ranging from 5.7972 to 23.32029

### Re=31793

Dataset with $N$ = 38 entries with values of $U(m/s)$ ranging from 8.6861 ro 34.22929

### Re=39855

Dataset with $N$ = 35 entries with values of $U(m/s)$ ranging from 13.23275 to 43.51941

# Method

### Preprocessing

Data Extraction: Since the data is in .txt format, it is impossible to do any computation on it. The frist step is to extract the $N$ entries of $y$ and  velocity $U$  into a dataframe with data type as float. This enables compuation and visualization of the data.

Data Cleaning: The unnecesary section of the data is dropped, in this case the $u'$ (acceleration) column is dropped as it is not required for the project. Additionally, $U_tau$ and $nu$ were neglected as there is no explicit need for them in the computations.

Data Preparation: To aid easy computation, some derived data was added to dataframe. From the power law velocity profile, we observe that the value of
$$
\begin{equation}
1-\frac{r}{R}
\end{equation}
$$
should be computed, but r is not explicitly given in the dataset. Using $y$ provided, we compute the value of the above equation as
$$
\begin{equation}
1-\frac{R-y}{R}
\end{equation}
$$
and add it to the dataframe.

A function performing the above preprocessing steps is defined. the function extract_data() takes the flow's reynolds number as an argument and returns a processed dataframe.

In [None]:
#Create datframe 
def extract_data(rey_num):
    base_path='/work/Files_for_Flow/Data/Retau_%%_basic_stats.txt'
    path=base_path.replace('%%',rey_num)
    data1=open(path, 'r')
    data2=[]
    for line in data1.readlines(): 
        data2.append(line.split())
    data1.close()
    data2=data2[6:]
    Re_tau=data2[0]
    R=0.45
    rey_df=pd.DataFrame(data2).astype(float)
    rey_df.columns=['y(m)','U(m/s)','u\'(m/s)']
    rey_df['r/R']=(R-rey_df['y(m)'])/R
    rey_df['regval_x']=(1-rey_df['r/R'])
    rey_df.drop(columns=['u\'(m/s)','y(m)'], inplace=True)
    return rey_df

### Linearity/ Non-Linearity

Upon inspection of the power-law velocity profile function, we observe a non-linear relationship of the variables.Visualization of the data confirms the non linear relationship between variables. 
A plot of $U$ against $r/R$ for all the subdatasets give a plot of the kind below

![fig1](/work/Files_for_Flow/Figures/Turbulent-flow-profiles.png)

Before we can proceed with regression, we will have to linearize the data, and to do this we will be applying a trick known as log-linearization. 
Log linearization aims to solve non liear problems by approximating the equations with log linear ones.
Performing log linearization on the given non-linear function, we have
$$
\begin{equation}
ln(u(r))=ln(U_{max})+ln[(1-\frac{r}{R})^{\frac{1}{n}}]
\end{equation}
$$
$$
\begin{equation}
ln(u(r))=ln(U_{max})+\frac{1}{n}ln[(1-\frac{r}{R})]
\end{equation}
$$

By inspection we see that the equation appears linear of the form $y=b +mx$ where
$$
\begin{equation}
y=ln(u(r))
\end{equation}
$$
$$
\begin{equation}
b=ln(U_{max})
\end{equation}
$$
$$
\begin{equation}
m=\frac{1}{n}
\end{equation}
$$
$$
\begin{equation}
x=ln[(1-\frac{r}{R})]
\end{equation}
$$

Therefore regression will be performed using $ln(u(r))$ as the dependent variable and $ln[(1-\frac{r}{R})]$ as the independent variable

### Regression

Regression is a technique that attempts to model the relationship between two variables by fitting a linear equation to the observed data.
For this project we will be applying the Least-Squares Regression method. This method calculates the best fitting line for the data by minimizing the sum of the squared errors between the observed data and the line.
in Matricial form, the regression can be written as
$$
\begin{equation}
V^T Va= V^T y
\end{equation}
$$
where $V$ is the vandermonde matrix of independent variables represented as
$$
V = \begin{bmatrix}
1 & x_1 \\
1 & x_2 \\
\vdots \\
1 & x_n
\end{bmatrix}
$$
$a$ represents the regression coefficients ($b$ and $m$) to be solved for, while $y$ represents the dependent variable.

To solve for the regression coefficients, the equation transforms as follows
$$
\begin{equation}
a = (V^T V)^{-1} V^T y
\end{equation}
$$

### Coefficient Transformation

Recall that the power-law velocity profile function was linearized before regression. It is therefore necessary to transform the coefficients after getting them from the regression procedure. Equation (8) and (9) transforms as follows:
$$
b=ln(U_{max})
$$
$$
\begin{equation}
\therefore U_{max}=e^{b}
\end{equation}
$$

Also,
$$
m=\frac{1}{n}
$$
$$
\begin{equation}
\therefore n=\frac{1}{m}
\end{equation}
$$


In python, a function linreg() that takes the reynold number as an argument is defined to:
1.Create the Vandermonde matrix, 
2.Perform the least-squares regression using the np.linalg.solve() function, 
3.Create a visualization of the best fit line on the linearized data, 
4.Return and store the transformed regression coefficients $U_{max}$ and $n$.
5.Return and store the mean absolute error of the regression on the original data.

The python code is given in the code block below:

In [None]:
Coefficient=[]
Coeff_untrans=[]
regression_error=[]
def linreg(rey_num):
    array=(extract_data(rey_num).to_numpy())[:,[0,2]]
    array_log=np.log(array)
    npoints =len(array)
    V = np.zeros((npoints,2)) # Vandermonde matrix initialization 

    for i in range(npoints):
	    for j in range(2):
		    V[i,j]=array_log[i,1]**j
    A = (V.transpose()).dot(V)
    rhs = (V.transpose()).dot(array_log[:,0])
    a = np.linalg.solve(A,rhs) #solving the linear equation
    Coeff_untrans.append(a)
    x_axis=array_log[:,1]
    y_axis=array_log[:,0]
    y_guess=a[0]+a[1]*x_axis
    plt.plot(x_axis,y_axis,'o', label=rey_num)
    plt.plot(x_axis,y_guess,color='black', label='regression')
    plt.xlabel('ln(1-r/R)')
    plt.ylabel('ln(U)')
    plt.legend()
    plt.show() #visualizing the best fit line
    Coefficient.append([np.exp(a[0]),1/a[1]]) #storing the transformed regression coefficients
    y_guesstrans=np.exp(a[0])+((1/a[1])*array[:,1])
    regression_error.append(np.mean(np.abs(array[:,0]-y_guesstrans))) #storing the mean absolute error

### Interpolation 

A simple explanation of interpolation is that it is a numerical method used to estimate/approximate unknown values  that fall in between known values. For this project, we will be interpolating for the values of $U_{max}$ and $n$ for the reynold number=35061 using the values of $U_{max}$ and $n$ of the other observed flows gotten from the regression procedure.
In particular, we will be applying the Langragian Interpolation technique for the estimation. This interpolation is of the form
$$
\begin{equation}
p(x) = \sum_{j=0}^n a_j L^j(x)
\end{equation}
$$
$$
\begin{equation}
L^j(x) = \prod_{\substack{0 \leq k \leq n \\ k \neq j}} \frac{x - x_k}{x_j - x_k}
\end{equation}
$$

where $x$ represents the target interpolation point which in our case ie $Re_{\tau}=35061$.


A function Lag_interp() is defined. It takes the independent variable, the dependent variable and, and the independent varible of the target point at which the we wish to estimate values as arguments.It returns the value of the interpolation at the target point.

Note: For the interpolation, the reynold number is considered the indeopendent variable (x), while $U_{max}$ and $n$ represent the dependent variables $f(x)$

A code snippet of the function is given below:

In [None]:
def Lag_interp(xi, fi, x_target):
    x_target=float(x_target)
    a_l = fi
    Ns = len(xi)
    # Interpolate onto x points 

    p_l = np.zeros_like(x_target)

    for j in range(Ns):
        Lp_j = 1. # Initialize Lagrange Polynomial
        for k in range(Ns):
            if j != k:
                Lp_j *= (x_target - xi[k]) / (xi[j] - xi[k])
        p_l += a_l[j] * Lp_j
    return p_l

# Results

### Regression

For the regresion problem, the table below presents the transformed coefficients $U_{max}$ and $n$ gotten for each reynold number.

Table 1
|  $Re_{\tau}$  &nbsp; &nbsp; &nbsp; 	|Estimated   $n$ 	 &nbsp; &nbsp; &nbsp; &nbsp;| Estimated   $U_{max}$ 	|
|:--------:	|:-----------:	|:--------------:	|
|  5522  	|      5.155870647871982      	|        5.868295678037469       	|
|  9906  	|      5.993802470889954      	|        10.663754743490907       	|
|  21337 	|      6.490288965338497      	|        24.822634070267245     	|
|  31793 	|      6.6032100335694865      	|        37.626684902531764       	|
|  39855 	|      45.930574945848456      	|        45.930574945848456      	|



A visualization of the regression plot in comparison to the linearized data is shown below:
For $Re_{\tau}=5522$
![5k](/work/5kregres.png)

For $Re_{\tau}=9906$
![10k](/work/10kregres.png)

For $Re_{\tau}=21337$
![20k](/work/20kregres.png)

For $Re_{\tau}=31793$
![30k](/work/30kregres.png)

For $Re_{\tau}=39855$
![40k](/work/40kregres.png)


Futhermore , the table below presents the mean absolute error of the regression on the original data for each reynold number.

Table 2
|  $Re_{\tau}$  &nbsp; &nbsp; &nbsp; 	|Regression Error 	 &nbsp; &nbsp; &nbsp; &nbsp;|
|:--------:	|:-----------:	|
|  5522  	|     3.483274270624879      	|
|  9906  	|     5.6067924277293955     	|
|  21337 	|      11.129099564450458       |
|  31793 	|      14.659365071621792     	|
|  39855 	|      18.116171030432206      	|

### Interpolation

After Interpolation, the value of $U_{max}$ and $n$ for the reynolds number=35061 gotten is shown below:
$$
Re_{\tau}=35061 ; U_{max}=41.221285003034794 ; n= 6.832213365701882


The code snippet that returns the value of the interpolation at the target point is presented below:

In [None]:
X_val=Reynum_array
Y_val=np.array(Coefficient)
U_max35=Lag_interp(X_val, Y_val[:,0], 35061)
n_35=Lag_interp(X_val, Y_val[:,1], 35061)
print(U_max35,n_35)

Proceeding to visualize the results of the velocity profile developed with $U_{max}$ and $n$ gotten from the interpolation result, we get the curve shown below:

![velocityprofile](/work/Interp2.png)

### Validation/Mean Absolute Error

After interpolation, the mean absolute error after comparing the interpolated values of U(m/s) againt the provided vaidation data is:
$$
\begin{equation}
Error= 1.38
\end{equation}
$$


# Discussions

In this project, both regression and interpolation techniques were employed to estimate the velocity profile of a turbulent flow based off other turbulent flows. A visual inspection of the results of the regression shows a good linear aproximation of the data (from the Log- lineariazation), and a good fit of the regression line to the linearized data. The U-max values in table 1 is seen to increase as the reynold number increases. This behaivour aligns with theory as velocity generally increases with increase in reynold number.
Additionally, the Mean Absolute errors (MAE) of the regression indicate a fairly good fit to the data.

After interpolation, we get an estimated value of $U_{max}=41.221$ and $n=6.8322$ for the reynolds number=35061. The resultig  velocity curve showed a good alignment with the validation data. The mean absolute error produced by the estimation happens to be minimal, thereby validating the interpolation proceedure carried out.

In general, the overall prediction of the fluid properties is seen to be good.

# Conclusion

We have succeded in confirming the validity of least-square regression and Lagrangian interpolation for the prediction of fluid flow properties. This holds the advantage of reducing the frequency of experimental data collection in industry and research fields as data points of interest can be estimated with a fairly good accuracy based off existing data.
To better refine this estimation, future work can be done on comparing the results of other interpolation and regression techinques to pick the one that best fits the data with minimal error.
Also, more observed data points can be added to the data, as this provides more reference for the estimation, thereby improving the accuracy of the estimation. 

# Bibliography

Prabhaker, R. (2011) 'Mathematical Methods Interpolation' , Guru Nanak Engineering College, Ibrahimpatnam Hyderabad.

Kevin D. Salyer (2000) 'An Introduction to Log Linearization', University of California Davis press.

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=2862c49b-4ca6-4b1c-adad-6d8ae1379427' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>