# Enhanced Data Analysis Using Python and NumPy
The following tasks demonstrate various ways to utilize Python and NumPy for data analysis, manipulating arrays, handling CSV data from a URL, and performing statistical computations.

Task 1: Finding Common Elements Between Two Arrays
This task involves identifying unique common elements between two arrays using NumPy, which is useful for intersecting datasets.

In [1]:
#Task 1 Write a Python/NumPy code block that finds the distinct/unique common items between two arrays.

import numpy as np #importing numpy from python directory
a = np.array([1, 2, 3, 2, 3, 4, 3, 4, 5, 6]) #defining the np array given in the questions
b = np.array([7, 2, 10, 2, 7, 4, 9, 4, 9, 8])
print(np.intersect1d(a, b)) #printing the desired result using the intersect function executed between the two defined numpy arrays

[2 4]


Task 2: Creating a 5x3 Array Using NumPy's Sequencing
This task efficiently creates a 5x3 array using NumPy's arange and reshape methods, showcasing how to generate and manipulate array shapes without manually inputting values.

In [2]:
#Task 2 Create a 5x3 array using knowledge you have of Python’s / NumPy’s sequencing functionality so that you do not need to explicitly key in every integer value.

import numpy as np #importing numpy from python directory
arr = np.arange(1,16).reshape(3,5) #as defined in the question an 2D array is defined with dimension of 3x5 
print(arr.T) #the required result is printed using the transpose function endured within the the print function.

[[ 1  6 11]
 [ 2  7 12]
 [ 3  8 13]
 [ 4  9 14]
 [ 5 10 15]]


Task 3: Removing Specific Items from an Array
Here, we remove elements from array a that are also present in array b, demonstrating set operations in NumPy for data cleaning.

In [3]:
#Task 3 Write a Python/Numpy code block that removes from array a any items that are also present in array b.

import numpy as np #importing numpy from python directory
a = np.array([12, 5, 7, 15, 3, 1, 8])
b = np.array([14, 6, 3, 11, 19, 12, 5]) #the given arrays in the assignment are defined
c = np.setdiff1d(a, b) #to remove from array a any items that are also present in array b the set difference function is called and executed.
print(c) #the desired result is printed

[ 1  7  8 15]


Task 4: Flattening a Multidimensional Array
This task shows how to convert a multidimensional array into a one-dimensional array, which is useful for data serialization or flattening data structures.

In [4]:
#Task 4 Process of tranforming a multidimensional array to a unidimensional array.

import numpy as np #importing numpy from python directory
arr = np.arange(1,16).reshape(3,5) #the same array from question 2 is carried on to this task
f = arr.flatten() #to make the array 1D the flattenning function is called
f #the desired result is stored in another 1D array and that array is being called for to print the result desired.

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

Task 5: Comprehensive Analysis of Water Consumption Data
We analyze a dataset concerning NYC's water consumption by loading it from a GitHub repository, showcasing how to work with remote data sources.

In [5]:
#Task 5 Pretak to read the csv file from in raw form from the github server repository.

import pandas #importing panda funtions from the python directory
import numpy as np #importing numpy from python directory
np.set_printoptions(suppress = True) # suppress scientific notation
nyc_water = pandas.read_csv('https://raw.githubusercontent.com/Vijayasuriya/Water_consumption_AP_ASSI3/main/M3_Data%20(1).csv').to_numpy()
type(nyc_water) #the given file is uploaded to the github repository
nyc_water       #and the link for the raw dataset are retrieved using csv retrieval function.

array([[   1979. , 7102100. ,    1512. ,     213. ],
       [   1980. , 7071639. ,    1506. ,     213. ],
       [   1981. , 7089241. ,    1309. ,     185. ],
       [   1982. , 7109105. ,    1382. ,     194. ],
       [   1983. , 7181224. ,    1424. ,     198. ],
       [   1984. , 7234514. ,    1465. ,     203. ],
       [   1985. , 7274054. ,    1326. ,     182. ],
       [   1986. , 7319246. ,    1351. ,     185. ],
       [   1987. , 7342476. ,    1447. ,     197. ],
       [   1988. , 7353719. ,    1484. ,     202. ],
       [   1989. , 7344175. ,    1402. ,     191. ],
       [   1990. , 7335650. ,    1424. ,     194. ],
       [   1991. , 7374501. ,    1469. ,     199. ],
       [   1992. , 7428944. ,    1369. ,     184. ],
       [   1993. , 7506166. ,    1368.5,     182. ],
       [   1994. , 7570458. ,    1357.7,     179. ],
       [   1995. , 7633040. ,    1325.7,     174. ],
       [   1996. , 7697812. ,    1297.9,     169. ],
       [   1997. , 7773443. ,    1205.5,     1

In [6]:
#Task 5.1 What is the maximum yearly NYC consumption of water in millions of gallons per day?

import pandas #importing panda funtions from the python directory
import numpy as np #importing numpy from python directory
np.set_printoptions(suppress = True)# suppress scientific notation
nyc_water = pandas.read_csv('https://raw.githubusercontent.com/Vijayasuriya/Water_consumption_AP_ASSI3/main/M3_Data%20(1).csv').to_numpy()
type(nyc_water) #the given file is uploaded to the github repository
nyc_water       #and the link for the raw dataset are retrieved using csv retrieval function.
np.max(nyc_water[:,2]) #the maximum yearly NYC consumption of water in millions of gallons per day is retreived using the slicing and max functions.

1512.0

In [7]:
#Task 5.2 How many calendar years are represented within this data set? NumPy's shape command is one way to find
#out.

import pandas #importing panda funtions from the python directory
import numpy as np #importing numpy from python directory
np.set_printoptions(suppress = True) # suppress scientific notation
nyc_water = pandas.read_csv('https://raw.githubusercontent.com/Vijayasuriya/Water_consumption_AP_ASSI3/main/M3_Data%20(1).csv').to_numpy()
type(nyc_water) #the given file is uploaded to the github repository
nyc_water       #and the link for the raw dataset are retrieved using csv retrieval function.
total_number_of_years = nyc_water.shape[0] #Using numpy shape command for to retrieve the total years represented in the dataset.
total_number_of_years #the desired result is retreived and printed

41

In [8]:
#Task 5.3 What is the mean and the standard deviation of the per capita daily water consumption?

import pandas #importing panda funtions from the python directory
import numpy as np #importing numpy from python directory
np.set_printoptions(suppress = True) # suppress scientific notation
nyc_water = pandas.read_csv('https://raw.githubusercontent.com/Vijayasuriya/Water_consumption_AP_ASSI3/main/M3_Data%20(1).csv').to_numpy()
type(nyc_water) #the given file is uploaded to the github repository 
nyc_water       #and the link for the raw dataset are retrieved using csv retrieval function.
m = np.mean(nyc_water[:,3]) #the mean is calculated using the mean function
s_d = np.std(nyc_water[:,3]) #the standard deviation is calculated using the standard deviation function
print('The Mean of the per capita daily water consumption is:',m,'\nThe Standard deviation of the per capita daily water consumption is:', s_d)
#the results are printed

The Mean of the per capita daily water consumption is: 158.41463414634146 
The Standard deviation of the per capita daily water consumption is: 31.847177467886052


In [9]:
#Task 5.4 What is the increase or decrease in population from year to year? Use NumPy's `diff` function to 
#create an array of differences and save that to a variable called "pop_diff", then print that variable to the screen.
 
import pandas #importing panda funtions from the python directory
import numpy as np #importing numpy from python directory
np.set_printoptions(suppress = True) # suppress scientific notation
nyc_water = pandas.read_csv('https://raw.githubusercontent.com/Vijayasuriya/Water_consumption_AP_ASSI3/main/M3_Data%20(1).csv').to_numpy()
type(nyc_water) #the given file is uploaded to the github repository 
nyc_water       #and the link for the raw dataset are retrieved using csv retrieval function.

pop_diff = np.diff(nyc_water[:,1]) #the increase or decrease in population from year to year is calculated using differential functions as directed by the task given.
pop_diff #the desired result is retuned and displayed.

array([-30461. ,  17602. ,  19864. ,  72119. ,  53290. ,  39540. ,
        45192. ,  23230. ,  11243. ,  -9544. ,  -8525. ,  38851. ,
        54443. ,  77222. ,  64292. ,  62582. ,  64772. ,  75631. ,
        84816. ,  89401. ,  60618. ,  16685.5,  16685.5,  16685.5,
        16685.5,  16685.5,  16685.5,  16685.5,  16685.5,  16685.5,
        16685.5,  97830. ,  75069. ,  50707. ,  38648. ,  30794. ,
         7795. , -37705. , -39523. , -61931. ])

Python and NumPy to address a variety of analytical tasks, from array operations to statistical analysis, highlighting the power and flexibility of these tools for data science.