Solving Real World ML Problems
In order to solve machine learning problems, we follow a structured approach that helps ensure accuracy, clarity, and effectiveness. Here are the main steps involved:

1. Look at the Big Picture
Understand the overall problem you're solving. Define your objective clearly — what does success look like?

2. Get the Data
Collect relevant and quality data from reliable sources. Without data, there’s no machine learning.

3. Explore and Visualize the Data
Analyze and visualize data to uncover patterns, trends, and anomalies. This step helps you understand what you're working with.

4. Prepare the Data
Clean, transform, and format the data. Handle missing values, normalize features, and split the data into training and testing sets.

5. Select a Model and Train It
Choose a suitable machine learning algorithm and train it using your data. This is where your model learns from patterns.

6. Fine-Tune Your Model
Optimize hyperparameters, try different techniques, and improve performance through iteration.

7. Present Your Solution
Explain your model’s results using visuals, metrics, and clear language so stakeholders can understand and make decisions.

8. Launch, Monitor, and Maintain
Deploy the model in the real world, monitor its performance, and update it regularly as new data arrives.



In [None]:
from sklearn import tree
import pandas as pd
import numpy as np

df = pd.read_csv("1750577174509-smartphone_data.csv")
print(df)

# You should work on 80-20 format for TraningData and TestingData Respectively
print(df.head())
print(df.tail())
print(df.shape)
print(df.columns)



     camera  age  ram  cpu_score  slot_sd  sims  price
0       110    2    4         89        1     2  31596
1       187    0    4         42        0     1  33530
2       100    2    4         52        0     2   9262
3        22    2    6         82        0     2  62199
4       114    2   12         88        1     1  66287
..      ...  ...  ...        ...      ...   ...    ...
995     197    1   16         54        1     2  14364
996      69    3    4         66        0     2  43349
997     101    0   12         50        0     1   8691
998     102    1   16         96        1     1  69363
999     159    2    6         90        0     2  38062

[1000 rows x 7 columns]
   camera  age  ram  cpu_score  slot_sd  sims  price
0     110    2    4         89        1     2  31596
1     187    0    4         42        0     1  33530
2     100    2    4         52        0     2   9262
3      22    2    6         82        0     2  62199
4     114    2   12         88        1     1  662

In [None]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier

data = pd.read_csv("1750577174509-smartphone_data.csv")

X = data.iloc[:, :-1]
y = data.iloc[:, -1]

model = RandomForestClassifier()
model.fit(X,y)

predictions = model.predict(X)

print(predictions)


[31596 33530  9262 62199 66287 67546 48150 30615 11327 62782 37768 36850
 59411 50788 58409 24848 54830 60446 47542 23160 64583 41998 35230 17076
 11805 39085 63571 22715 13750 56395 21232 61008 43903 14953 40093 33500
 30418 61741 42314 56225 30449 25478 43179 59769 54881 33338 34899 12145
 40101 10775 53961 39647 53619 53838 63780 27539 44738 65124 34616 22119
 33159 62567 45099 18295 42710 69799 63178 32685 57343 33479 54172 41629
 63863 50043 25033 44902 52571 23408 64739 65980 25420 24351 12131 59300
 52825 19349 68931 52254 46225 46847 20453 26814 26263 20158 23735 23830
 55893 52673 69470 31926 53066 42779 11373 18335 31741 18692 55295 53541
 42057  9447 27583 30765 67287 31906 63211 27567 22139 54525 46230 60368
 27520 27702 59934 33566 15657 45688 38134 18345 63476 63479 49185 57928
 35413 13208 41455 57681 36018 62219 53991 16080 29902 59305 17486 46862
 64126 66870 29158 48460 51856 43871 43208 28519 39379 36378 47892 62353
 57996 22031 22536  8446 44683 56555 60261 25618 14

In [6]:
import pandas as pd 
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score , classification_report

data = pd.read_csv("1750577174509-smartphone_data.csv")

X = data.iloc[:,:-1]
y = data.iloc[:, -1]

X_train , X_test, y_train, y_test  = train_test_split(X,y,test_size = 0.3, random_state=42)

model = RandomForestClassifier(random_state=42)
model.fit(X_train,y_train)

y_pred = model.predict(X_test)

print("Accuracy:",accuracy_score(y_test, y_pred))
print("classification Report:\n", classification_report(y_test, y_pred))



Accuracy: 0.0
classification Report:
               precision    recall  f1-score   support

        8307       0.00      0.00      0.00       1.0
        8336       0.00      0.00      0.00       1.0
        8406       0.00      0.00      0.00       0.0
        8589       0.00      0.00      0.00       1.0
        8870       0.00      0.00      0.00       0.0
        9051       0.00      0.00      0.00       0.0
        9134       0.00      0.00      0.00       0.0
        9262       0.00      0.00      0.00       1.0
        9447       0.00      0.00      0.00       1.0
        9587       0.00      0.00      0.00       0.0
        9630       0.00      0.00      0.00       0.0
        9657       0.00      0.00      0.00       0.0
        9695       0.00      0.00      0.00       0.0
        9814       0.00      0.00      0.00       1.0
        9848       0.00      0.00      0.00       0.0
        9881       0.00      0.00      0.00       0.0
        9961       0.00      0.00      0.00

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
