# Intro

Learning Modeling with Python and Pandas

If this is your first work with Python, **STOP**. Before starting in this notebook you should first review [DataSciencePythonCrISPDM.ipynb](https://colab.research.google.com/drive/1cezizAGahyGMFobMU96Jwxjxluut0xIW?usp=drive_link)

Author is Michael McCarthy (mbmccart@utica.edu)

Feedback Welcomed

# Libarary Loading

In [None]:
# Python interactive development enviroments (IDE) come with base Python 3.x but
#     certain modules, packages, and libraries as need.

# Pandas is the main way we will work with the dataframes
# https://pandas.pydata.org/docs/getting_started/index.html#getting-started
import pandas as pd
# pandas defaults to not showing all rows, this pd (pandas) option update ensures all rows are shown
# for larger datasets, update 'None' with a specific number
pd.set_option('display.max_rows', None)
# pandas defaults to not showing all columns, this pd (pandas) option update ensures all columns are shown
pd.set_option('display.max_columns', None)

# MatPlotLib is a common visulaization package.
# note that just pyplot is added in, not the full package
from matplotlib import pyplot as plt
#It is a function that renders the figure in a notebook (instead of displaying a dump of the figure object).
%matplotlib inline

#numpy is the "The fundamental package for scientific computing with Python"
# https://numpy.org/
import numpy as np
# To make outputs more understandable, remove the scientific notation
np.set_printoptions(suppress=True)

#Setting Seed for reproducable results (important to have tensorflow random seed set as well)
#If the randome seed is not set, then some models will have different results each time (a very frustrating thing)
#https://datascience.stackexchange.com/questions/13314/causes-of-inconsistent-results-with-neural-network
np.random.seed(1)

# package for descriptive statistics, there are others you can use like seaborne
!pip install researchpy
import researchpy as rp

In [None]:
# Packages & Libraries needed to load data from Google Drive
# For this class, ALL Data will be loaded from Google Drive.
# TIP: Load data just once.
# https://pypi.org/project/PyDrive2/

!pip install -U -q PyDrive2
from pydrive2.auth import GoogleAuth
from pydrive2.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

In [None]:
# Authenticate users to have acces to google Drive.
# Google will make you authorize access to connect directly to the Google Drive
# The process might change, just approve the access by approving or clicking "Continue".

auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# Data Loading

In [None]:
## Steps to load your CSV data

# 1) Open the Google Drive with the Data
# 2) Find your assigned dataset
# 3) Click "share"
# 4) Copy link
# 5) Paste link here:
#    The link should look something like this: https://drive.google.com/file/d/1WVluSCNJ--RS1zqQ_0EJScPgurw9CmHj/view?usp=sharing
# 6) Copy the unique file id, for the example above, it looks like this: 1WVluSCNJ--RS1zqQ_0EJScPgurw9CmHj
#   Hint: the file id is all the content between the forward slashes slashes /  including the letter, numbers, dashes, and underscores
# 7) In the next code line below, replace the unique google doc file id for the example data with the unique file id for your data.
file_id = '1WVluSCNJ--RS1zqQ_0EJScPgurw9CmHj' # replace the id with id of file you want to access
downloaded = drive.CreateFile({'id':file_id})

# 8) Update the file name below to match the file name in the Google Drive Folder by replacing 'Heart_Synthetic.csv' with your file name.
# Hint, you must have single or double quotes around the file name.
file_name = 'Heart_Synthetic.csv' # Update needed here, replace the file name with the id of file you want
downloaded.GetContentFile(file_name)
df = pd.read_csv(file_name)

# 9) Run all cells before this code cell (Hint: Shortcut = CTRL + F8)
# 10) Run this cell (click the play button or Shift + Enter)
# 11) Check the output is what you expected
print(f"{file_name} Data Shape: ",df.shape)
print(df.head())

# Exploratory Data Analysis (EDA)

Once the data is loaded, we need to understand it.

This is the "Data Understanding" portion of the *Cross-Industry Standard process for Data Mining* (CrISP-DM)

In [None]:
#load the descriptive statistics into a dataframe called "descriptive_statistics_view"
descriptive_statistics_view=df.describe()

In [None]:
# show the descriptive_statistics_view
descriptive_statistics_view

In [None]:
# Notice how the view changes using a print statement
print(descriptive_statistics_view)

In [None]:
# prompt: build histograms for all numerical values in df

# Iterate over numerical columns and create histograms
for col in df.select_dtypes(include=np.number):
  plt.figure()  # Create a new figure for each histogram
  plt.hist(df[col], bins=10)  # Adjust the number of bins as needed
  plt.title(f'Histogram of {col}')
  plt.xlabel(col)
  plt.ylabel('Frequency')
  plt.show()


# Wrangling

The data can be very "dirty" or not in the form we need it to be so we do considerable data wrangling.

This is the "Data Preparation" step in the CriSP-DM.

In [None]:
# prompt: review df for a list of multiple variable names from a list and drops the variables from the df. the defualt list is just one variable 'RecordID'
# NOTE, this was my third prompt. The first two were so vague that Gemini had a **very** long function.
"""
Gemini will often build a function. This is great because it is reusable within the notebook or can be brought over to another.
It is important that the function is run. That is best done in a seperate cell after the function is defined.
"""
def drop_variables(df, variables_to_drop=['RecordID']):
    """
    Reviews a Pandas DataFrame for a list of variable names and drops them.

    Args:
        df: The input DataFrame.
        variables_to_drop: A list of variable names to drop. Defaults to ['RecordID'].

    Returns:
        A new DataFrame with the specified variables removed, or the original DataFrame if no variables are found.
        Prints a message indicating which variables were dropped or if none were found.
    """

    variables_dropped = []             # defines an empty list to add dropped variables to
    for var in variables_to_drop:      # for loop to look at each varible in the list
        if var in df.columns:          # if test to look if the variable name is in the dataframe's columns
            df = df.drop(var, axis=1)  # drop the variable from the dataframe
            variables_dropped.append(var)  # add the variable to the list of dropped variables

    if variables_dropped:   # if test defaults to "TRUE" so if the list is not blank, it will do the rest of the if statement
        print(f"Variables dropped: {variables_dropped}")
    else:                   # else statements are optional, but should be used
        print("No variables to drop found in the DataFrame.")

    # Final Notes:
    """
    Gemini did not identify the much simplier way to do this.
    However, this one line of coade does not build or report the variables_dropped list.
    """
    # df = df.drop(variables_to_drop, errors= 'ignore', axis=1)
    """ This is how it would look in a production enviornment, to prvent the extra memory needed to overwrite the df. """
    # df.drop(columns=variables_to_drop, errors='ignore', inplace=True) # `axis=1` indicates columns and removed because it is the default, therefore not needed

    return df               # identifies that the fucntion returns the original df modified


In [None]:
# Apply the fution just defined
# update the variables in the list to all numerical data that acts as nominal variables
df = drop_variables(df, variables_to_drop=['RecordID', 'SSN']) # SSN was added to remove list due to privacy concerns

In [None]:
# Assess the Data types.  Make sure we have numbers, not just strings.
print("\nTraining DATA\n")
print(df.dtypes)

In [None]:
#descriptive statistics
df.describe()

In [None]:
# Call the dataframe to show the variables and open Interactive table in Colab
# NOTE, if you scroll all the way to the right, click the Table icon to view an "interactive sheet" that acts like an excel spreadsheet
# The plot icon will generate many suggested plots, only some of them are worth using in your analysis.
df
# After you you run and view your df, you should see the "Next steps:" options under the datframe output. These are wise to use for the very first steps of your EDA.
# After you you "View recommended plots", be sure to dig deeper with your own plots

# matplotlib

In [None]:
# prompt: build a histogram for all numerical variables in the df. set the number of bins to the number of unique values in the variable
"""
import pandas as pd # Gemini will include the packages you need even though you already imported these in the first code cell.
from matplotlib import pyplot as plt
from matplotlib import pyplot as plt
"""

# Assuming 'df' is already loaded as in your provided code

for col in df.select_dtypes(include=['number']):
  num_unique = df[col].nunique()
  plt.figure(figsize=(8, 6))
  plt.hist(df[col], bins=num_unique)
  plt.title(f'Histogram of {col}')
  plt.xlabel(col)
  plt.ylabel('Frequency')
  plt.show()


# seaborn

In [None]:
# need the seaborn library loaded.
import seaborn as sns

In [None]:
#build the same histogram with seaborn library
#you have lots of options
sns.histplot(data=df, x='ca')
# This variable is unique to the example dataset in "Heart_Synthetic.csv", update, move, or delete this cell or it will error because
# the variables are not in the new dataset (i.e., dataframe or df)

In [None]:
# https://seaborn.pydata.org/generated/seaborn.catplot.html
sns.catplot(data=df, x="THAL_string", y="ca")
# This variable is unique to the example dataset in "Heart_Synthetic.csv", update, move, or delete this cell or it will error because
# the variable is not in the new dataset (i.e., dataframe or df)

In [None]:
#same data, but with a violin plot
#https://seaborn.pydata.org/generated/seaborn.violinplot.html#seaborn.violinplot
sns.violinplot(data=df, x="THAL_string", y="ca")

In [None]:
#swarm Plot
#https://seaborn.pydata.org/generated/seaborn.swarmplot.html#seaborn.swarmplot
sns.catplot(data=df, x="THAL_string", y="ca", hue="SEX_string", kind="swarm")


In [None]:
#Box & Whisker
#https://seaborn.pydata.org/generated/seaborn.boxplot.html#seaborn.boxplot
sns.boxplot(data=df, x="THAL_string", y="ca", hue="SEX_string")

In [None]:
# prompt: insert sns correlation matrix

# Correlation Matrix
# https://seaborn.pydata.org/generated/seaborn.heatmap.html
correlation_matrix = df.select_dtypes(include=np.number).corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.show()


In [None]:
# another way to use sns to view correlations
sns.pairplot(df)
plt.show()

In [None]:
# prompt: Build a loop to identify all numerical values in the df and then perform a boxplot and a histogram. Be sure to label each plot.

# Loop through columns to identify numerical features
for col in df.columns:
  if pd.api.types.is_numeric_dtype(df[col]):
    # Create a boxplot
    plt.figure(figsize=(8, 6))  # Adjust figure size as needed
    sns.boxplot(x=df[col])
    plt.title(f'Boxplot of {col}')
    plt.xlabel(col)
    plt.show()

    # Create a histogram
    plt.figure(figsize=(8, 6))  # Adjust figure size as needed
    sns.histplot(df[col])
    plt.title(f'Histogram of {col}')
    plt.xlabel(col)
    plt.show()

# plotly

In [None]:
#Look at the Dependent Variable as it relates to these categorical variables
#consider this type of side-by-side box and whisker
#https://plotly.com/python/plotly-express/
import plotly.express as px
px.box(data_frame=df,x='SEX_string', y='incident')


In [None]:
px.box(data_frame=df,x='Race_String', y='incident')
# does the plot show outliers????

# Data prep for modeling

In [None]:
# Assess the shape before get dummies
df.shape

In [None]:
# Load variable name to paste into get dummies below
df.dtypes

## Managing nulls and nans

In [None]:
# prompt: build a new dataframe called "no_nulls_df" that will apply the mean to each null in the numerical variables and the mode to each null in the categorical variables.

# Create a copy of the DataFrame to avoid modifying the original
no_nulls_df = df.copy()

# Fill nulls in numerical columns with the mean
numerical_cols = no_nulls_df.select_dtypes(include=np.number).columns
for col in numerical_cols:
    no_nulls_df[col] = no_nulls_df[col].fillna(no_nulls_df[col].mean())

# Fill nulls in categorical columns with the mode
categorical_cols = no_nulls_df.select_dtypes(exclude=np.number).columns
for col in categorical_cols:
    no_nulls_df[col] = no_nulls_df[col].fillna(no_nulls_df[col].mode()[0])


In [None]:
# For the synthetic heart data, note that the dfshape with get_dummies went from 17 columns to 32 columns
no_nulls_df.shape

## Managing Categorical Variables

*sklearn* will not model with categorical variables.

*pandas* get_dummies transform all categorical variables into a boolean (True or False). A boolean is still a catgorical variable so not helpful for sklearn.

*sklearn* has its own encoding tools that transform variables into 0 representing  "False" and 1 representing "True". This is the better option.

In [None]:
# import sklearn for one hot encoding and test train split
# https://www.freecodecamp.org/news/how-to-build-and-train-linear-and-logistic-regression-ml-models-in-python/
import sklearn

In [None]:
# some categorical variables need to be transformed into numbers via one-hot encoding or get dummies.
# This is very important to do BEFORE splitting data into Testing and Training
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html

# To run this, remove the '#' from the next line
# df= pd.get_dummies(data=df)

# NOTE the use of df for the dataframe name with get dummies will update the df for now on.
# If I want a unique data frame that is different from the orginal dataframe, I need a different name like 'df_model'
#,columns=["SEX_string", "CP_string", "RESTECG_string","EXANG_string","FBS_string","SLOPE_string","THAL_string","Race_String"]

In [None]:
# https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html
# interesting article about why one-hot encoding (ohe) is better in ML
# https://albertum.medium.com/preprocessing-onehotencoder-vs-pandas-get-dummies-3de1f3d77dcc
"""
from sklearn.preprocessing import OneHotEncoder
OneHotEncoder(
    categories='categorical_Variable_Name',  # Categories determinded automatically from a particualr feature (variable) or can be specifically delineated (see sklearn documentation link).
    drop=None, # none means not drop one of the features, this is more important for dichotomous categorical variables (e.g., click, not-click)
    sparse=True, # Will return sparse matrix if set True; a sparse matrix is more likely the more categories in the variable. "sparse matrix or sparse array is a matrix in which most of the elements are zero"
    dtype=<class 'numpy.float64'>, # Desired data type of the output; the input is a string; in most cases the ohe process is meant to make a number from strings.
    handle_unknown='error' # Whether to raise an error; good to accept the default.
)
ohe = OneHotEncoder()
transformed = ohe.fit_transform(df['categorical_Variable_Name])
print(transformed.toarray())
"""

In [None]:
# # For initial modeling with sklearn, we remove categorical variables from no_nulls_df making num_no_nulls_df
# prompt: build a function to remove all non numerial variables from the a df and report the specific variables that are removed

def drop_non_numerical_cols(df):
    """
    Removes non-numerical columns from a Pandas DataFrame and reports the removed columns.

    Args:
        df: The input DataFrame.

    Returns:
        A tuple containing:
            - The DataFrame with non-numerical columns removed.
            - A list of the names of the removed columns.
    """
    numerical_df = df.select_dtypes(include=np.number)
    removed_cols = list(set(df.columns) - set(numerical_df.columns))
    print("\nRemoved columns: ")
    for col in removed_cols:
        print(col)
    return numerical_df, removed_cols


In [None]:
# For initial modeling with sklearn, we remove categorical variables from no_nulls_df making num_no_nulls_df

num_no_nulls_df, list_categorical_cols_removed = drop_non_numerical_cols(no_nulls_df)


In [None]:
# if categorical variables are encoded, we need to use drop variables that are reporting the same data in different ways,
# EX: dichotmous variables from get_dummies like "SEX_string_male" and "SEX_string_female"
# del used above to remove one variable, but df.drop used here to remove multiple columns in one go

no_nulls_df.drop(['insert_list_of_cat_vars_here_01','insert_list_of_cat_vars_here_02'],
        axis=1, #axis 1 means columns, the drop tool can work on rows if axis=0 . . . which is the default
        inplace=True,
        errors="ignore") # if a variable in the list is not in the df, then it will not error
 # We only want numerical data for the linear regression.
  # Nominal data must be transformed using `get_dummies` or ohe.
  # Ordinal data can be transformed with get_dummies/ohe OR an ordinalEncoder
  # https://scikit-learn.org/dev/modules/generated/sklearn.preprocessing.OrdinalEncoder.html
 # all "target" variables are used for the classification analysis we can do with this data set so not needed for the regression

Most examples for modeling show how the data is **randomly** split into a training set (roughly 80% of the full dataset) and a testing set (the remaining 20% of the dataset).

In [None]:
#load test train split
from sklearn.model_selection import train_test_split

In [None]:
#look to see if the columsn were dropped
num_no_nulls_df.dtypes

In [None]:
# stratified sampling
# Step 2, random sample of each DF (in this case, SEX_string == female and  SEX_string==male)
# Format from Stackoverflow
#larger, smaller = test_train_split(df, test_size=0.3)
#https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
#sklearn.model_selection.train_test_split(*arrays, test_size=None, train_size=None, random_state=None, shuffle=True, stratify=None)
X = num_no_nulls_df.drop("incident",axis='columns')
y =num_no_nulls_df["incident"]
print(y.shape)
print(X.shape)
# for analysis on people, we would typcially want a stratified sample based on gender but that is a categorical variable and not in this example.
# stratVar=X["SEX_string"] #define the variable used for the stratified sample
# a stratified sample ensures an equal representation of a particular category (or group) is in both the training and testin dataframes
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=.2, train_size=.8, random_state=7, shuffle=True)#, stratify=stratVar)
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

#read more
# k fold stratification
#https://scikit-learn.org/stable/modules/cross_validation.html#stratification

In [None]:
#analyze the y data (the target variable) used in the training
y_train.describe()

In [None]:
#analyze the y data (the target variable) used in the testing
y_test.describe()

In [None]:
#look at some variables with our loc and iloc skills
X_train.iloc[1 : 13, 0 : 11]

In [None]:
"""
# loc selects rows and columns with specific labels
# interestingly, this won't work because the category name inserts a space into the loc statement that doesn't work
X_train.loc[[175], ["CP_string_asymptomatic angina","trestbps"]]
# this cell WILL ERROR if the index rows are in the testing dataset
"""

# Modeling sklearn

In [None]:
# numpy already loaded in Libarary Loading section
from sklearn.linear_model import LinearRegression

In [None]:
model = LinearRegression()

In [None]:
# Assess the updated X_test
X_test

In [None]:
model.fit(X_train, y_train)
# you can use just this statement to combine this cell with the previous cell
# model = LinearRegression().fit(X_train, y_train)

In [None]:
# see all the coefficents
pd.DataFrame(model.coef_, X_train.columns, columns = ['Coeff'])

In [None]:
r_sq = model.score(X_train, y_train)
print(f"The R-squared (i.e., the coefficient of determination) is {r_sq}")
print(f"intercept: {model.intercept_}")
#print(name, f"coefficient: {model.coef_}")
print('Variance score: {}'.format(model.score(X_test, y_test)))

In [None]:
#make predictions with the model
predictions = model.predict(X_test)

In [None]:
# Plot the predictions verus actual
plt.scatter(y_test, predictions)

In [None]:
#plot residuals to test assumption of Linear Regression
plt.hist(y_test - predictions)

In [None]:
## setting plot style
plt.style.use('fivethirtyeight')

## plotting residual errors in training data
plt.scatter(model.predict(X_train), model.predict(X_train) - y_train,
            color = "green", s = 10, label = 'Train data')

## plotting residual errors in test data
plt.scatter(model.predict(X_test), model.predict(X_test) - y_test,
            color = "red", s = 10, label = 'Test data')

## plotting line for zero residual error
plt.hlines(y = 0, xmin = 0, xmax = 50, linewidth = 2)

## plotting legend
plt.legend(loc = 'upper right')

## plot title
plt.title("Residual errors")

## method call for showing the plot
plt.show()

# Modeling with statsmodels.api

In [None]:
#Load Library
# https://www.statsmodels.org/stable/api.html#regression
import statsmodels.api as sm


In [None]:
# if X_train has bool values, they need to be converted to numerical values of 0 or 1 for Statsmodels.api
# If string or object variables are in the data frame, then they also need to be encoded
# Convert boolean columns to numerical (0 and 1)
# prompt: build a function for a df to identify bool variables and transform them to int

def transform_bool_to_int(df):
    """
    Identifies boolean variables in a Pandas DataFrame and transforms them to integers (1 for True, 0 for False).
    Args:
        df: The input DataFrame.
    Returns:
        A DataFrame with boolean columns converted to integer type.
    """
    for col in df.columns:
        if pd.api.types.is_bool_dtype(df[col]):
            df[col] = df[col].astype(int)
    return df


In [None]:
# apply the transform_bool_to_int
X_train = transform_bool_to_int(X_train)
X_test = transform_bool_to_int(X_test)

In [None]:
# set models
# NOTE, this data is the same used in sklearn
testmodel = sm.OLS(y_train, X_train)
testmodel2 = testmodel.fit()
print(testmodel2.summary())
#this model summary provides the Coeffients for the Linear Regression
# The p-value is reported in the 'P>|t|' column.
# The p-value should be below the alpha (deault of 0.5) to be considered significant.

# Modeling with pycaret

In [None]:
# https://github.com/pycaret/pycaret/blob/master/tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb
# https://www.pycaret.org/tutorials/html/REG102.html

In [None]:
#install pycaret, for non colab notebooks, remove the ! before pip
!pip install --upgrade pycaret # this can take about 2 minutes to complete
import pycaret
pycaret.__version__
## IMPORTANT NOTE: Pycaret does not need to split the data into X_train etc.
#   Just the df with the target variable identified is all that is needed for Pycaret.

In [None]:
# important helper functions for pycaret in colab
## This might not be needed anymore
# option 1
#from pycaret.utils import enable_colab
#enable_colab()
# option 2
#from pycaret.utils import setup_colab # change it to this line.
#setup_colab() # and change this line from enable_colab() to setup_colab() as well.

In [None]:
# parameter information at: https://www.pycaret.org/tutorials/html/REG102.html

from pycaret.regression import *

In [None]:
# direct copy of code cell from Pycaret tutorial
#  using "df" name to define data rather than "data"
df = no_nulls_df.sample(frac=0.9, random_state=786)
data_unseen = no_nulls_df.drop(df.index)

no_nulls_df.reset_index(drop=True, inplace=True)
data_unseen.reset_index(drop=True, inplace=True)

print('Data for Modeling: ' + str(no_nulls_df.shape))
print('Unseen Data For Predictions: ' + str(data_unseen.shape))

In [None]:
# set up the model and identify the target variable (i.e., Dependent Variable)
# This is needed if you do ANY regression. Later you will identify specific type of regresssion or
 # use compare function to look ALL possible regression methods.
reg01 = setup(data = df, target = 'incident', session_id=123, normalize=True, transform_target=True)
# setting normalize=True takes care of scaling problems between ordinal and continuous variables

# IMPORTANT NOTE: Pycaret will processing until you approve the automatic variable it suggests
# scroll to the bottom of the output cell and type "enter" key to accept or type "quit" to stop

In [None]:
# Builds all regression models. This is often called "autoML" for automatic machine learning
# these are good baseline models

#best = compare_models(exclude = ['ransac']) # exclude = ['ransac'] is from the tutorial
compare_models()
# This can take a little bit becasue it is building 15+ regression models.
# Goal is low error terms (MAE, MSE, RMSE, RMSLE, MAPE) and high R2.
# R2 ranges from zero to one.
# R2 Values approaching zero are weak models, negative R2 means insignificant.

In [None]:
# List of the models available from Pycaret
models() # from Pycaret Tutorial

In [None]:
# From Pycaret Tutorial
lin_reg = create_model('lr')

In [None]:
# identify the paramaters for the lin_reg
print(lin_reg)

In [None]:
# Tuning models produced above
# This is considered iteration!

'''Tuning models explained:
"Model tuning is also known as hyperparameter optimization.
Hyperparameters are variables that control the training process.
These are configuration variables that do not change during a Model training job.
Model tuning provides optimized values for hyperparameters, which maximize your model's predictive accuracy."
https://www.mlexam.com/model-tuning/
'''

tuned_lr = tune_model(lin_reg)
#this might error out because the synthetic data did not significant models, see 'compare_models()' cell

In [None]:
# identify the paramaters for the tuned lin_reg
# Compare to lin_reg
print(tuned_lr)
print(lin_reg)

In [None]:
#comparing model plots for model
plot_model(tuned_lr)

In [None]:
#this allows you to assess all the features included, not just the default of the top 10
plot_model(tuned_lr, plot='feature_all', scale = 1)