# Data Graph Explorer

### Tasks / Features:
- Get a .csv file in three ways
  - uploading it from the local computer
  - getting a url from user input
  - putting the url in the code
- Use the Pandas library to save the .csv as a dataframe
- Print headings and the first two rows
- Store the column names as a list
- Choose one or two columns and convert the data to Numpy arrays
- Display data as a scatter plot or a line graph
- Be able to do this for different column combinations, and interpret the graphs

In [23]:
!pip install pandas requests



In [24]:
import pandas as pd
import google.colab as gcolab
import io
import requests

print("""
------- Data Graph Explorer -------
Step 1: Get a .csv file

Please select one of the options:
[1] From local computer
[2] Using url
[3] Using default url

(Default .csv file url: https://raw.githubusercontent.com/plotly/datasets/refs/heads/master/diabetes.csv)
-----------------------------------""")
selected_opt = int(input('Input an option: '))
print('-----------------------------------')

df = None
if selected_opt == 1:
  uploaded = gcolab.files.upload()
  is_first = True
  for file_content in uploaded.values():
    if not is_first:
      break
    is_first = False
    df = pd.read_csv(io.BytesIO(file_content))

if df is None:
  if selected_opt == 2:
    url = input('Input an url of file: ')
  elif selected_opt == 3:
    url = 'https://raw.githubusercontent.com/plotly/datasets/refs/heads/master/diabetes.csv'
  else:
    print('Invalid option')
    exit()

  content = requests.get(url).content
  df = pd.read_csv(io.StringIO(content.decode('utf-8')))



------- Data Graph Explorer -------
Step 1: Get a .csv file

Please select one of the options:
[1] From local computer
[2] Using url
[3] Using default url

(Default .csv file url: https://raw.githubusercontent.com/plotly/datasets/refs/heads/master/diabetes.csv)
-----------------------------------
Input an option: 2
-----------------------------------
Input an url of file: https://raw.githubusercontent.com/plotly/datasets/refs/heads/master/earthquake.csv


In [25]:
df.loc[[0, 1]]

Unnamed: 0,Unknown Number of Deaths,0 Deaths,1-50 Deaths,51-100 Deaths,101-1000 Deaths,>1001 Deaths,"Unknown Number of Deaths, lat","Unknown Number of Deaths, lon","0 Deaths, lat","0 Deaths, lon","1-50 Deaths, lat","1-50 Deaths, lon","51-100 Deaths, lat","51-100 Deaths, lon","101-1000 Deaths, lat","101-1000 Deaths, lon",">1001 Deaths, lat",">1001 Deaths, lon"
0,"New Caledonia: Noumea; Vanuatu: Port-Vila, 1995","Solomon Islands: Santa Cruz Islands; Vanuatu,...","Philippines: Mindanao: Talakag-Malaybalay, 1987","Switzerland, 1801","Switzerland, 1584","Philippines: Mindanao: S, 1976",-23.008,169.9,-12.584,166.676,8.047,125.41,46.9,8.6,46.3,7,6.292,124.09
1,"Pakistan: Battagram, 2015","Iran: Western: Masjed-E- Soleyman, 2003","Pakistan: Quetta, Nushki, 1978","Venezuela: Cariaco-Cumana, 1997","Switzerland, 1584","China: Sichuan Province: Kangding, 1725",34.658,73.302,31.953,49.209,29.926,66.302,10.598,-63.486,46.2,7,30.1,101.9


In [26]:
import numpy as np

columns = df.columns.to_list()

print("""-----------------------------------
Avaiilable columns:""")
print('\n'.join([f" {i}). {c}" for i, c in enumerate(columns)]))

print("-----------------------------------")
first_c = input("Select first column: ")
second_c = input("Select second column (enter to skip it): ")

first_r = df.loc[:, columns[int(first_c)]].to_numpy()
second_r = df.loc[:, columns[int(second_c)]].to_numpy() if second_c else np.empty(0)

supported_types = (
  np.integer, np.int8, np.int16, np.int32, np.int64,
  np.uint, np.uint8, np.uint16, np.uint32, np.uint64,
  np.float16, np.float16, np.float32, np.float64,
  np.complex64, np.complex128,
)

print("-----------------------------------")
if not isinstance(first_r[0], supported_types):
  print("First column data type isn't supported.")
elif len(second_r) > 0 and not isinstance(second_r[0], supported_types):
  print("Second column data type isn't supported.")
else:
  print(f"""Selected number of data:
First\t: {len(first_r)}
Second\t: {len(second_r)}
  """)

-----------------------------------
Avaiilable columns:
 0). Unknown Number of Deaths
 1). 0 Deaths
 2). 1-50 Deaths
 3). 51-100 Deaths
 4). 101-1000 Deaths
 5). >1001 Deaths
 6). Unknown Number of Deaths, lat
 7). Unknown Number of Deaths, lon
 8). 0 Deaths, lat
 9). 0 Deaths, lon
 10). 1-50 Deaths, lat
 11). 1-50 Deaths, lon
 12). 51-100 Deaths, lat
 13). 51-100 Deaths, lon
 14). 101-1000 Deaths, lat
 15). 101-1000 Deaths, lon
 16). >1001 Deaths, lat
 17). >1001 Deaths, lon
-----------------------------------
Select first column: 0
Select second column (enter to skip it): 1
-----------------------------------
First column data type isn't supported.


In [27]:

%matplotlib inline
import matplotlib.pyplot as plt
import sklearn

def display_graph(first_r, second_r):
  x = np.arange(len(first_r))
  model = sklearn.linear_model.LinearRegression()
  fig, ax = plt.subplots()

  if len(second_r) > 0:
    model.fit(first_r.reshape(-1, 1), second_r.reshape(-1, 1))

    ax.plot(first_r, model.predict(first_r.reshape(-1, 1)), color='red')
    ax.scatter(first_r, second_r)
  else:
    model.fit(x.reshape(-1, 1), first_r.reshape(-1, 1))

    ax.plot(x, model.predict(x.reshape(-1, 1)), color='red')
    ax.scatter(x, first_r)

  plt.show()

if not isinstance(first_r[0], supported_types):
  print("First column data type isn't supported.")
elif len(second_r) > 0 and not isinstance(second_r[0], supported_types):
  print("Second column data type isn't supported.")
else:
  display_graph(first_r, second_r)



First column data type isn't supported.
