### how to stream data into a linear regression model and print the output?

To stream data into a linear regression model and print the output in Python, you can follow these steps:

1. **Install Required Libraries**: Make sure you have the necessary libraries installed. You'll need `pandas` for data manipulation, `scikit-learn` for linear regression, and `numpy` for numerical operations. You can install them using pip:

   ```bash
   pip install pandas scikit-learn numpy
   ```

2. **Prepare Data**: Prepare your data in a format that can be used for linear regression. Ensure that you have a dataset with numeric features and a target variable. You can load the data from a CSV file or generate it programmatically.

3. **Import Required Libraries**: In your Python script, import the necessary libraries:

   ```python
   import pandas as pd
   from sklearn.linear_model import LinearRegression
   import numpy as np
   ```

4. **Load and Prepare Data**: Load your dataset into a pandas DataFrame and split it into the feature matrix (`X`) and the target variable (`y`):

   ```python
   # Load data from CSV or prepare it programmatically
   data = pd.read_csv('your_data.csv')

   # Split into features (X) and target variable (y)
   X = data[['feature1', 'feature2', '...']]  # Replace with your feature columns
   y = data['target']  # Replace with your target column
   ```

5. **Create and Fit Linear Regression Model**: Create an instance of the LinearRegression class and fit the model to your data:

   ```python
   # Create linear regression model
   model = LinearRegression()

   # Fit the model to the data
   model.fit(X, y)
   ```

6. **Stream New Data and Print Output**: Stream new data points to the model and print the predicted output:

   ```python
   # Stream new data points and print predictions
   new_data = pd.DataFrame([[val1, val2, ...]], columns=['feature1', 'feature2', ...])  # Replace with your new data
   predictions = model.predict(new_data)
   print('Predicted output:', predictions)
   ```

   Replace `[val1, val2, ...]` with the actual values for your new data.

7. **Run the Script**: Save your Python script and run it in your Python environment:

   ```bash
   python your_script.py
   ```

   The script will load the data, fit a linear regression model, stream new data points, and print the predicted output.

Make sure to adjust the code based on your specific dataset and requirements.

### for kafka

To stream data from Kafka into a linear regression model and print the output, you can follow these steps:

1. **Install Required Libraries**: Ensure that you have the necessary libraries installed. You'll need `pandas` for data manipulation, `scikit-learn` for linear regression, `numpy` for numerical operations, and `kafka-python` for interacting with Kafka. You can install them using pip:

   ```bash
   pip install pandas scikit-learn numpy kafka-python
   ```

2. **Import Required Libraries**: In your Python script, import the necessary libraries:

   ```python
   import pandas as pd
   from sklearn.linear_model import LinearRegression
   import numpy as np
   from kafka import KafkaConsumer
   ```

3. **Define Kafka Consumer**: Create a KafkaConsumer instance to consume data from the Kafka topic. Specify the bootstrap servers and topic name:

   ```python
   kafka_bootstrap_servers = 'localhost:9092'
   kafka_topic = 'your_topic'  # Replace with your Kafka topic

   consumer = KafkaConsumer(kafka_topic, bootstrap_servers=kafka_bootstrap_servers)
   ```

4. **Create and Fit Linear Regression Model**: Create an instance of the LinearRegression class and fit the model to the incoming data:

   ```python
   model = LinearRegression()

   # Define placeholders for features (X) and target variable (y)
   X = []
   y = []

   # Consume messages from Kafka and collect data for model fitting
   for message in consumer:
       data = message.value.decode()
       # Parse the incoming data into feature and target values
       # and append them to X and y respectively
       # Replace the following lines with your data parsing logic
       feature_value, target_value = data.split(',')
       X.append(float(feature_value))
       y.append(float(target_value))

       # Fit the model with the updated data
       model.fit(np.array(X).reshape(-1, 1), np.array(y))

       # Print the current coefficient and intercept of the model
       print('Coefficient:', model.coef_[0])
       print('Intercept:', model.intercept_)
   ```

   In this example, it is assumed that the incoming data from Kafka is a comma-separated string where the first value represents the feature and the second value represents the target variable. Modify the data parsing logic accordingly based on your Kafka message structure.

5. **Run the Script**: Save your Python script and run it in your Python environment:

   ```bash
   python your_script.py
   ```

   The script will consume messages from the specified Kafka topic, update the linear regression model, and print the current coefficient and intercept.

Make sure to adjust the code according to your specific Kafka configuration, topic name, and data format.

In [1]:
import pandas as pd
import numpy as np
import pickle

SS = pickle.load(open('scaler.pkl', 'rb'))
lr = pickle.load(open('model.pkl', 'rb'))
import pandas as pd
# from sklearn.linear_model import LinearRegression
import numpy as np
from kafka import KafkaConsumer

kafka_bootstrap_servers = 'localhost:9092'
kafka_topic = 'my_topic'  # Replace with your Kafka topic

# consumer = KafkaConsumer(kafka_topic, bootstrap_servers=kafka_bootstrap_servers)

# model = LinearRegression()

# Define placeholders for features (X) and target variable (y)
df = pd.read_csv('test.csv')
x = df.head(1).values[0].tolist()
x = np.array(x).reshape(1, -1)


https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


In [2]:
x

array([['1461', '20', 'RH', '80.0', '11622', 'Pave', 'nan', 'Reg', 'Lvl',
        'AllPub', 'Inside', 'Gtl', 'NAmes', 'Feedr', 'Norm', '1Fam',
        '1Story', '5', '6', '1961', '1961', 'Gable', 'CompShg',
        'VinylSd', 'VinylSd', 'nan', '0.0', 'TA', 'TA', 'CBlock', 'TA',
        'TA', 'No', 'Rec', '468.0', 'LwQ', '144.0', '270.0', '882.0',
        'GasA', 'TA', 'Y', 'SBrkr', '896', '0', '0', '896', '0.0', '0.0',
        '1', '0', '2', '1', 'TA', '5', 'Typ', '0', 'nan', 'Attchd',
        '1961.0', 'Unf', '1.0', '730.0', 'TA', 'TA', 'Y', '140', '0',
        '0', '0', '120', '0', 'nan', 'MnPrv', 'nan', '0', '6', '2010',
        'WD', 'Normal']], dtype='<U32')

In [3]:
x = pd.DataFrame(x, columns=df.columns)
data = x

In [4]:
data

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition
0,1461,20,RH,80.0,11622,Pave,,Reg,Lvl,AllPub,...,120,0,,MnPrv,,0,6,2010,WD,Normal


In [7]:
data.dtypes

OverallQual     object
GrLivArea       object
GarageCars      object
GarageArea      object
TotalBsmtSF     object
1stFlrSF        object
FullBath        object
TotRmsAbvGrd    object
YearBuilt       object
YearRemodAdd    object
MasVnrArea      object
Fireplaces      object
GarageYrBlt     object
BsmtFinSF1      object
LotFrontage     object
WoodDeckSF      object
2ndFlrSF        object
OpenPorchSF     object
HalfBath        object
LotArea         object
BsmtFullBath    object
BsmtUnfSF       object
BedroomAbvGr    object
ScreenPorch     object
dtype: object

In [12]:
column_list = ['OverallQual', 'GrLivArea', 'GarageCars', 'GarageArea', 'TotalBsmtSF',
       '1stFlrSF', 'FullBath', 'TotRmsAbvGrd', 'YearBuilt', 'YearRemodAdd',
       'MasVnrArea', 'Fireplaces', 'GarageYrBlt', 'BsmtFinSF1', 'LotFrontage',
       'WoodDeckSF', '2ndFlrSF', 'OpenPorchSF', 'HalfBath', 'LotArea',
       'BsmtFullBath', 'BsmtUnfSF', 'BedroomAbvGr', 'ScreenPorch']
data=data[column_list]
data.head()
data.fillna(data.median(), inplace=True)
data.fillna(data.mode().iloc[0], inplace=True)

In [13]:

# column_list = ['OverallQual', 'GrLivArea', 'GarageCars', 'GarageArea', 'TotalBsmtSF',
#        '1stFlrSF', 'FullBath', 'TotRmsAbvGrd', 'YearBuilt', 'YearRemodAdd',
#        'MasVnrArea', 'Fireplaces', 'GarageYrBlt', 'BsmtFinSF1', 'LotFrontage',
#        'WoodDeckSF', '2ndFlrSF', 'OpenPorchSF', 'HalfBath', 'LotArea',
#        'BsmtFullBath', 'BsmtUnfSF', 'BedroomAbvGr', 'ScreenPorch']
# data = data[column_list]
# data.fillna(df.median(), inplace=True)
# data.fillna(df.mode().iloc[0], inplace=True)
data = pd.DataFrame(SS.transform(data), columns=data.columns)
pred = lr.predict(data)
data['SalePrice'] = pred

In [14]:
pred

array([114046.16848359])

  data.fillna(df.median(), inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data.fillna(df.median(), inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data.fillna(df.median(), inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data.fillna(df.median(), inplace=True)
A value is trying to be set on a copy of a slic

Unnamed: 0,OverallQual,GrLivArea,GarageCars,GarageArea,TotalBsmtSF,1stFlrSF,FullBath,TotRmsAbvGrd,YearBuilt,YearRemodAdd,...,WoodDeckSF,2ndFlrSF,OpenPorchSF,HalfBath,LotArea,BsmtFullBath,BsmtUnfSF,BedroomAbvGr,ScreenPorch,SalePrice
0,-3.940503,-2.884242,-1.948171,-2.211321,-2.412214,-3.010491,-1.407826,-3.449946,-65.255046,-96.131159,...,-0.758179,-0.792501,-0.701214,1.6803,-1.054039,1.315644,-1.286314,-3.314119,-0.275056,-1164004.0


In [None]:
# Consume messages from Kafka and collect data for model fitting
for message in consumer:
    data = message.value.decode()
    print(data)
    