# Important Note
Utilize applications powered by large language models (LLMs) to search for the necessary code and content to answer all the questions in this assignment. Avoid using Google Search. Create a login ID or account with well-known large language models like Copilot, ChatGPT, Gemini, and Claud. These platforms will provide the resources you need to complete the tasks effectively.

# Q1) Use Gemini or any LLM to come up with new business or product ideas for millennials in Bangalore. The ideas should be aimed at software professionals and need only a small amount of money to start. Describe the prompts you used and share 3-5 of the most interesting ideas it came up with.

In [None]:
!pip install OpenAI
!pip install langchain
!pip install Cohere
!pip install langchain_community

In [None]:
import os
from langchain.llms import Cohere
from google.colab import userdata

# Get the API key from userdata
cohere_api_key = userdata.get('COHERE')

# Set the COHERE_API_KEY environment variable
os.environ["COHERE_API_KEY"] = cohere_api_key

# Now initialize the Cohere LLM
llm = Cohere()
prompt = """
Consider yourself to be a software professional. Suggest some new business or product ideas for millennials in Bangalore which nees small amount of money to start
"""
result = llm.invoke(prompt)
print(result)

# Q3) Building and Validating the model

## Q3.1) Write code to import data from the below location

https://raw.githubusercontent.com/venkatareddykonasani/Datasets/master/Credit%20Risk%20Data_balanced/Credit_risk_data_bal_v2.csv

In [None]:
# prompt: Write code to import data from the below location
# https://raw.githubusercontent.com/venkatareddykonasani/Datasets/master/Credit%20Risk%20Data_balanced/Credit_risk_data_bal_v2.csv

import pandas as pd

url = "https://raw.githubusercontent.com/venkatareddykonasani/Datasets/master/Credit%20Risk%20Data_balanced/Credit_risk_data_bal_v2.csv"
df = pd.read_csv(url)
# You can now work with the dataframe 'df'
# For example, to display the first 5 rows:
print(df.head())


## Q3.2)  Write code for Splitting the data into Train Test

In [11]:
import sklearn
from sklearn.model_selection import train_test_split

In [12]:
train, test = train_test_split(df, test_size=0.2, random_state=42)

In [None]:
print (test.shape)
print (train.shape)

## Q3.3) Write code to build the model. Use 'SeriousDlqin2yrs' as target  variable and rest all as predictor variables. Print the beta co-efficients

In [None]:
# prompt: Write code to build the model. Use 'SeriousDlqin2yrs' as target variable and rest all as predictor variables. Print the beta co-efficients using sklearn

from sklearn.linear_model import LogisticRegression

# Define features (X) and target (y)
X_train = train.drop('SeriousDlqin2yrs', axis=1)
y_train = train['SeriousDlqin2yrs']
X_test = test.drop('SeriousDlqin2yrs', axis=1)
y_test = test['SeriousDlqin2yrs']

# Initialize and train the logistic regression model
model = LogisticRegression() # Use liblinear solver for small datasets
model.fit(X_train, y_train)

# Print the beta coefficients (coefficients of the predictors)
print("Beta Coefficients:")
for feature, coef in zip(X_train.columns, model.coef_[0]):
  print(f"{feature}: {coef}")

print("\nIntercept:", model.intercept_[0])


## Q3.4)Perform Model Validation and Cross Validation on train and test data. Print the confusion matrix and accuracies

In [None]:
# prompt: Perform Model Validation and Cross Validation on train and test data. Print the confusion matrix and accuracies

from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.model_selection import cross_val_score

# Predict on the training set
y_train_pred = model.predict(X_train)

# Evaluate the model on the training set
train_accuracy = accuracy_score(y_train, y_train_pred)
train_confusion_matrix = confusion_matrix(y_train, y_train_pred)

print("Training Set Performance:")
print(f"Accuracy: {train_accuracy}")
print(f"Confusion Matrix:\n{train_confusion_matrix}")

# Perform cross-validation on the training set (e.g., 5-fold)
cv_scores = cross_val_score(model, X_train, y_train, cv=5)
print(f"\nCross-Validation Scores (Training): {cv_scores}")
print(f"Mean CV Accuracy: {cv_scores.mean()}")

# Predict on the test set
y_test_pred = model.predict(X_test)

# Evaluate the model on the test set
test_accuracy = accuracy_score(y_test, y_test_pred)
test_confusion_matrix = confusion_matrix(y_test, y_test_pred)

print("\nTest Set Performance:")
print(f"Accuracy: {test_accuracy}")
print(f"Confusion Matrix:\n{test_confusion_matrix}")


# Q4) Download the image from the below location and answer the below questions
https://raw.githubusercontent.com/venkatareddykonasani/Datasets/master/Inoices/Invoice_3.png



# Q2) Use Claude or any LLM to make 5 multiple-choice questions on pandas data handling and 5 on numpy. Try generating the questions several times and choose the best ones. Review the questions to ensure they are good and correct.

In [None]:
prompt =
"""
Consider yourself an examiner. Create 5 multiple choice questions on data handling by pandas and 5 multiple choice questions on data handling by numpy. Choose the best questions after
"""
result = llm.invoke(prompt)
print(result)