### OCI Data Science - Useful Tips
<details>
<summary><font size="2">Check for Public Internet Access</font></summary>

```python
import requests
response = requests.get("https://oracle.com")
assert response.status_code==200, "Internet connection failed"
```
</details>
<details>
<summary><font size="2">Helpful Documentation </font></summary>
<ul><li><a href="https://docs.cloud.oracle.com/en-us/iaas/data-science/using/data-science.htm">Data Science Service Documentation</a></li>
<li><a href="https://docs.cloud.oracle.com/iaas/tools/ads-sdk/latest/index.html">ADS documentation</a></li>
</ul>
</details>
<details>
<summary><font size="2">Typical Cell Imports and Settings for ADS</font></summary>

```python
%load_ext autoreload
%autoreload 2
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

import logging
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.ERROR)

import ads
from ads.dataset.factory import DatasetFactory
from ads.automl.provider import OracleAutoMLProvider
from ads.automl.driver import AutoML
from ads.evaluations.evaluator import ADSEvaluator
from ads.common.data import ADSData
from ads.explanations.explainer import ADSExplainer
from ads.explanations.mlx_global_explainer import MLXGlobalExplainer
from ads.explanations.mlx_local_explainer import MLXLocalExplainer
from ads.catalog.model import ModelCatalog
from ads.common.model_artifact import ModelArtifact
```
</details>
<details>
<summary><font size="2">Useful Environment Variables</font></summary>

```python
import os
print(os.environ["NB_SESSION_COMPARTMENT_OCID"])
print(os.environ["PROJECT_OCID"])
print(os.environ["USER_OCID"])
print(os.environ["TENANCY_OCID"])
print(os.environ["NB_REGION"])
```
</details>

In [2]:
#Questão 1
#What's the version of Pandas that you installed?
import pandas as pd
pd.__version__

'1.5.3'

In [52]:
#Questão 2
#How many columns are in the dataset?
df=pd.read_csv('housing.csv')
df

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,ocean_proximity
0,-122.23,37.88,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0,NEAR BAY
1,-122.22,37.86,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0,NEAR BAY
2,-122.24,37.85,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0,NEAR BAY
3,-122.25,37.85,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0,NEAR BAY
4,-122.25,37.85,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0,NEAR BAY
...,...,...,...,...,...,...,...,...,...,...
20635,-121.09,39.48,25.0,1665.0,374.0,845.0,330.0,1.5603,78100.0,INLAND
20636,-121.21,39.49,18.0,697.0,150.0,356.0,114.0,2.5568,77100.0,INLAND
20637,-121.22,39.43,17.0,2254.0,485.0,1007.0,433.0,1.7000,92300.0,INLAND
20638,-121.32,39.43,18.0,1860.0,409.0,741.0,349.0,1.8672,84700.0,INLAND


In [5]:
#Questão 3
#Which columns in the dataset have missing values?
df.isnull().sum()

longitude               0
latitude                0
housing_median_age      0
total_rooms             0
total_bedrooms        207
population              0
households              0
median_income           0
median_house_value      0
ocean_proximity         0
dtype: int64

In [7]:
#Questão 4
#How many unique values does the ocean_proximity column have?
df.nunique()

longitude               844
latitude                862
housing_median_age       52
total_rooms            5926
total_bedrooms         1923
population             3888
households             1815
median_income         12928
median_house_value     3842
ocean_proximity           5
dtype: int64

In [19]:
#Questão 5
#What's the average value of the median_house_value for the houses located near the bay?
df[df['ocean_proximity'] == 'NEAR BAY'].median_house_value.mean()

259212.31179039303

In [53]:
#Questão 6
'''
Calculate the average of total_bedrooms column in the dataset.
Use the fillna method to fill the missing values in total_bedrooms with the mean value from the previous step.
Now, calculate the average of total_bedrooms again.
Has it changed?
Hint: take into account only 3 digits after the decimal point.
'''
avg_total_bedrooms = df.total_bedrooms.mean()
avg_total_bedrooms
#original = 537.8705525375618

537.8705525375618

In [50]:
#Questão 6
df.loc[df.total_bedrooms.isnull(), 'total_bedrooms'] = avg_total_bedrooms
df.total_bedrooms.isnull().sum()
avg_total_bedrooms = df.total_bedrooms.mean()
avg_total_bedrooms
#after = 537.8705525375617
#resposta = não muda

537.8705525375617

In [82]:
#Questão 7
'''
Select all the options located on islands. -> x
Select only columns housing_median_age, total_rooms, total_bedrooms. -> x.
Get the underlying NumPy array. Let's call it X. -> x.
Compute matrix-matrix multiplication between the transpose of X and X. To get the transpose, use X.T. Let's call the result XTX. -> x.
Compute the inverse of XTX. -> x.
Create an array y with values [950, 1300, 800, 1000, 1300]. -> x.
Multiply the inverse of XTX with the transpose of X, and then multiply the result by y. Call the result w.
What's the value of the last element of w?
'''
import numpy as np
new_df = df[df['ocean_proximity'] == 'ISLAND']
new_df = new_df[['housing_median_age','total_rooms','total_bedrooms']]
X = pd.DataFrame(new_df).to_numpy()
XT = np.transpose(X)
XTX = np.dot(XT,X)
XTX_inv=np.linalg.inv(XTX)
y=[950, 1300, 800, 1000, 1300]
w = (np.dot(XTX_inv,XT)).dot(y)
w[-1]
#answer = 5.6992

5.699229455065586