In [None]:
#Step 1 Import the Boston Dataset and libraries
from sklearn.datasets import load_boston
import matplotlib.pyplot as plt
boston = load_boston()


This code imports the load_boston function from the sklearn.datasets module and the matplotlib.pyplot module and assigns them to the variables boston and plt, respectively.

Then, it calls the load_boston function to load the Boston Housing dataset, which contains information about various features of houses in Boston and their corresponding median values.

The data is returned as a dictionary-like object with the following attributes:

data: the data points for each feature
target: the median values of owner-occupied homes in $1000s
feature_names: the names of the features
DESCR: a description of the dataset
Once the data is loaded, it can be used for various machine learning tasks such as regression and feature selection.

The matplotlib.pyplot module is often used for creating visualizations of the data. However, this code does not create any visualizations, so it is incomplete in that sense.

# Exercise 1

In [None]:
#Step 2 extract data and target
x = boston.data
y = boston.target

This code assigns the data attribute of the boston object to the variable x, and the target attribute of the boston object to the variable y.

x is a numpy array containing the features of each house in the dataset. It has shape (506, 13), meaning there are 506 rows (one for each house) and 13 columns (one for each feature).

y is a numpy array containing the corresponding median values of owner-occupied homes for each house in the dataset. It has shape (506,).

By splitting the dataset into x and y arrays, we can use them as input and output variables, respectively, for various machine learning algorithms.

In [None]:
#Step 3 print data and target size
print(x.shape) 
print(y.shape)


This indicates that x has 506 rows (one for each house) and 13 columns (one for each feature), and y has 506 elements, one for each corresponding target value (the median value of owner-occupied homes in $1000s).

In [None]:
#Step 4 print dataset description
print(boston.DESCR)


the code that loads the Boston Housing dataset

In [None]:
print(y)

If you execute print(y) after running the code that loads the Boston Housing dataset and assigns the target values to y, you should see a numpy array containing the median values of owner-occupied homes in $1000s for each house in the dataset.

In [None]:
#Step 5 plot target with histogram
plt.hist(y, bins=30, color='b', density=True, stacked=True) 


This code will create a histogram of the target values y using matplotlib.

Here's what each argument does:

* y: the data to be plotted
* bins: the number of bins to use in the histogram
* color: the color of the bars in the histogram (in this case, blue)
* density: if True, the histogram will be normalized so that the area under the bars adds up to 1
* stacked: if True, multiple histograms will be stacked on top of each other (this doesn't apply here, since we're only plotting one histogram)
To display the histogram, you need to call plt.show() after the plt.hist() function call.

In [None]:
#Step 6 Import libraries
import seaborn as sns
import pandas as pd



This code imports the seaborn library for data visualization and the pandas library for data manipulation.

seaborn provides high-level interfaces for creating informative and attractive statistical graphics in Python, while pandas is a powerful library for data manipulation and analysis.

After importing these libraries, you can use functions and methods from these libraries to create visualizations and manipulate data, respectively.

In [None]:
#Step 7 convert to dataframe

df = pd.DataFrame(x,columns=boston.feature_names)


This code creates a pandas DataFrame named df from the x array loaded from the Boston Housing dataset. The column names of the DataFrame are set to the feature names of the dataset, which are loaded from the boston object using boston.feature_names.

Each row in the DataFrame corresponds to a single house in the dataset, and the columns represent the different features of the houses, such as the number of rooms, the crime rate in the area, and the age of the house. This allows you to manipulate and analyze the data using pandas functions and methods, which can be very powerful and convenient.

In [None]:
#Step 8 add target(y) as a new column named ‘MEDV’ to dataframe
df['MEDV'] = pd.Series(y)


This code adds a new column to the pandas DataFrame df containing the target variable y from the Boston Housing dataset. The new column is named 'MEDV', which stands for "median value of owner-occupied homes in $1000s", as specified in the dataset description.

The pd.Series() function is used to convert the y array into a pandas Series, which can be easily added as a new column to the DataFrame.

This creates a new DataFrame with all the original features from the Boston Housing dataset, as well as a new column containing the target variable. This allows you to analyze the relationships between the features and the target variable using pandas functions and methods.

In [None]:
#Step 9 Print dataframe
print(df)


In [None]:
#Step 9 plotting pairwise relationships in a dataset
g = sns.PairGrid(df,height=1)
g.map(plt.scatter)
plt.show()


This code creates a pair plot using seaborn and matplotlib, which shows the pairwise relationships between different features of the Boston Housing dataset.

The PairGrid() function from seaborn creates a grid of subplots for each pair of features in the DataFrame df, and the height argument specifies the height of each subplot in inches.

The map() method is used to apply a plotting function to each subplot in the grid. In this case, plt.scatter is used to create a scatter plot for each pair of features.

Finally, plt.show() is used to display the plot.

This type of plot can be useful for identifying patterns and correlations in the data, as well as outliers and other anomalies. The scatter plots show how each feature varies with respect to other features in the dataset, and the diagonal plots show the distribution of each feature.

In [None]:
#Step 10 plotting correlation relationships in a dataset
fig=plt.figure(figsize=(12,12))
sns.heatmap(df.corr(),vmax=1,square=True,annot=True)

This code creates a heatmap using seaborn and matplotlib to visualize the correlation between different features of the Boston Housing dataset.

The figsize argument in plt.figure() is used to specify the size of the figure in inches.

The heatmap() function from seaborn creates a heatmap of the correlation matrix between the features in the DataFrame df. The vmax argument specifies the maximum value for the color scale, which is set to 1 to indicate perfect positive correlation. The square argument specifies that the heatmap should have square cells, and the annot argument specifies that the values of the correlation coefficients should be displayed in each cell of the heatmap.

Finally, plt.show() is used to display the plot.

This type of plot can be useful for identifying strong correlations between features in the dataset, which can be used to inform feature selection and modeling decisions. Strong correlations between features can indicate redundancy in the data or potential collinearity, which can affect the performance of predictive models.

# Exercise 2

In [None]:
#utllib
# Step 1 import libraries
import urllib.request
#headers are not included here
# Step 2 collect data via url
urr = urllib.request.urlopen('https://www.dmu.ac.uk/')
content = urr.read()
urr.close()
# Step 3 decode content and print it
html = content.decode()
print(html)


This code uses the urllib library in Python to collect the HTML content of a web page.

In Step 1, the urllib.request module is imported, which provides functions for making HTTP requests and handling responses.

In Step 2, the urllib.request.urlopen() function is used to open a URL and return a file-like object that can be used to read the content of the web page. The URL in this case is 'https://www.dmu.ac.uk/', which is the website for De Montfort University in the UK.

The read() method is then used to read the contents of the web page, which are stored in the content variable. The close() method is used to close the file-like object and release any resources used to access the web page.

In Step 3, the decode() method is used to convert the content of the web page from bytes to a string using the default character encoding. The resulting string is then printed to the console using the print() function.

This code can be useful for collecting data from web pages for analysis or scraping purposes, although it is important to respect the terms of use and copyright restrictions of the website being accessed.


In [None]:
#requets.get
# Step 1 import libraries
import requests
# Step 2 get data and print
rer = requests.get('https://www.dmu.ac.uk/')
print(rer.status_code)
html = rer.text
print(html)


This code uses the requests library in Python to collect the HTML content of a web page.

In Step 1, the requests module is imported, which provides functions for making HTTP requests and handling responses.

In Step 2, the requests.get() function is used to send an HTTP GET request to the specified URL, which is 'https://www.dmu.ac.uk/' in this case. The response from the server is returned as a Response object, which is assigned to the variable rer.

The status_code attribute of the Response object is then printed using the print() function, which displays the HTTP status code of the response. A status code of 200 indicates that the request was successful and the web page content was returned.

The text attribute of the Response object is then assigned to the variable html, which contains the HTML content of the web page as a string. This string is then printed to the console using the print() function.

This code can be useful for collecting data from web pages for analysis or scraping purposes, although it is important to respect the terms of use and copyright restrictions of the website being accessed. The requests library provides more advanced features for working with HTTP requests and responses, such as setting headers, sending data, and handling errors.

# Weather API

In [None]:
import urllib.request
import json

keyAPI = '90f8a247ee90dba3382c324d5002359f'
#This free API key allows no more than 1 request per second
#You can register your own if it fails which is free 
lat = '50.80'
lon = '-1.09'
#Using the longitude and latitude of Portsmouth to retrieve weather data
urlAPI = "http://api.openweathermap.org/data/2.5/weather?lat="+lat+"&lon="+lon+"&APPID="+keyAPI

response = urllib.request.urlopen(urlAPI)
content = response.read()
data = json.loads(content)
#type: python dictionary

print('City:')
print(data['name'])
print('Weather:')
print(data['weather'][0]['description'])
#This list is extremely annoying here
print('Temprature (in Kelvin):')
print(data['main']['temp'])
print('Wind speed:')
print(data['wind']['speed'])

This code retrieves weather data for a specified location using the OpenWeatherMap API.

In Step 1, the urllib.request and json modules are imported, which provide functions for making HTTP requests and handling JSON-formatted data, respectively.

In Step 2, a free API key for the OpenWeatherMap service is assigned to the variable keyAPI. This key is used to authenticate requests to the API and is limited to one request per second.

In Step 3, the latitude and longitude of a location are assigned to the variables lat and lon, respectively. In this case, the location is Portsmouth, UK.

In Step 4, the URL for the OpenWeatherMap API is constructed using the latitude, longitude, and API key. The URL is then used to send an HTTP GET request to the API using the urllib.request.urlopen() function, and the response is assigned to the variable response.

In Step 5, the content of the response is read using the read() method, and the resulting bytes are parsed as JSON data using the json.loads() function. The resulting dictionary is assigned to the variable data.

In Step 6, various weather-related data points are extracted from the data dictionary and printed to the console using the print() function. These include the name of the city, a description of the weather, the temperature in Kelvin, and the wind speed.

Overall, this code demonstrates how to use Python to access data from an external API and parse it as JSON data for further analysis.