# ABOUT DATA MINING (using headings one level )

##### (using paragraph)
Data mining is the process of discovering patterns, correlations, trends, and useful information from large sets of data, using a combination of statistical analysis, machine learning, and database systems. 
The goal of data mining is to extract knowledge from a data set and transform it into an understandable structure for further use. (using paragraph)

##### (using bullets)
##### Here are key points to understand about data mining:
* Pattern Discovery
* Predictive Analysis
* Large Datasets
* Diverse Applications
* Techniques and Tools
* Ethical Considerations

#### (using bold)
**Pattern Discovery**: It involves identifying unusual patterns or anomalies and consistent patterns in data. For example, finding frequent buying patterns in supermarket transaction data.

**Predictive Analysis**: Data mining can be used to construct models that predict future trends or behaviors. 

**Large Datasets**: Data mining is particularly useful for dealing with large quantities of data (Big Data), where manual analysis would be impractical or impossible.

**Diverse Applications**: It is used across a wide range of industries, such as finance for credit scoring and fraud detection, marketing for customer segmentation, retail for inventory management, and in healthcare for predicting patient outcomes.

**Techniques and Tools**: Data mining employs a variety of techniques including clustering (finding groups of similar items), classification (assigning items to predefined categories), regression (predicting a continuous value), and association rule learning (discovering relationships between variables).

**Ethical Considerations**: With its ability to uncover patterns and personal information, data mining raises privacy and ethical concerns. It's important to use data mining techniques responsibly and in compliance with privacy laws and ethical standards.

# CONCLUSION

#### (using italics)
*In summary, data mining is a powerful tool that allows organizations to make informed decisions by identifying trends, patterns, and relationships in data that might not be immediately apparent. Its applications are vast and can provide significant competitive advantages and insights for businesses and researchers*.

In [7]:
import requests

def get_data_from_url(url, token):
    # Set up the headers with the web token for authentication
    headers = {
        'Token': f'{token}',
        'Email': 'RYadannavar4686@muleriders.saumag.edu'

    }

    # Make the HTTP GET request to the URL
    response = requests.get(url, headers=headers)

    # Check if the request was successful
    if response.status_code == 200:
        # Process the data (assuming it's JSON)
        data = response.json()
        return data
    else:
        # Handle errors (e.g., print an error message)
        print(f'Failed to fetch data: {response.status_code}')
        return None

# Example usage
url = 'https://www.ncei.noaa.gov/cdo-web/api/v2/datasets/'  # Replace with the actual URL
token = 'SewXCENnyBLmAxmyRLWgIvmDiuxbmHbh'         # Replace with your actual web token
data = get_data_from_url(url, token)

if data is not None:
    print(data)


{'metadata': {'resultset': {'offset': 1, 'count': 11, 'limit': 25}}, 'results': [{'uid': 'gov.noaa.ncdc:C00861', 'mindate': '1750-02-01', 'maxdate': '2024-01-24', 'name': 'Daily Summaries', 'datacoverage': 1, 'id': 'GHCND'}, {'uid': 'gov.noaa.ncdc:C00946', 'mindate': '1763-01-01', 'maxdate': '2024-01-01', 'name': 'Global Summary of the Month', 'datacoverage': 1, 'id': 'GSOM'}, {'uid': 'gov.noaa.ncdc:C00947', 'mindate': '1763-01-01', 'maxdate': '2024-01-01', 'name': 'Global Summary of the Year', 'datacoverage': 1, 'id': 'GSOY'}, {'uid': 'gov.noaa.ncdc:C00345', 'mindate': '1991-06-05', 'maxdate': '2024-01-27', 'name': 'Weather Radar (Level II)', 'datacoverage': 0.95, 'id': 'NEXRAD2'}, {'uid': 'gov.noaa.ncdc:C00708', 'mindate': '1994-05-20', 'maxdate': '2024-01-25', 'name': 'Weather Radar (Level III)', 'datacoverage': 0.95, 'id': 'NEXRAD3'}, {'uid': 'gov.noaa.ncdc:C00821', 'mindate': '2010-01-01', 'maxdate': '2010-01-01', 'name': 'Normals Annual/Seasonal', 'datacoverage': 1, 'id': 'NORMAL

In [9]:
import requests
import os
import json
from datetime import datetime, timedelta

# Function to fetch data from NOAA for a given range
def fetch_noaa_data(start_date, end_date, token):
    base_url = "https://www.ncdc.noaa.gov/cdo-web/api/v2/data"
    params = {
        'datasetid': 'GHCND',
        'locationid': 'ZIP:80249',
        'units': 'standard',
        'startdate': start_date,
        'enddate': end_date,
        'limit': 1000
    }
    headers = {
        'token': token
    }

    response = requests.get(base_url, headers=headers, params=params)
    return response.json()

# Directory to save JSON files
data_dir = 'data/'
os.makedirs(data_dir, exist_ok=True)

# Your API token
api_token = 'SewXCENnyBLmAxmyRLWgIvmDiuxbmHbh'

# Loop through the years 2008-2022
for year in range(2008, 2023):
    # Construct the start and end dates for the API call
    start_date = f'{year}-12-15'
    end_date = f'{year+1}-01-21'
    
    # Fetch the data
    data = fetch_noaa_data(start_date, end_date, api_token)
    
    # Save the data to a JSON file
    file_path = os.path.join(data_dir, f'winter_{year}-{year+1}.json')
    with open(file_path, 'w') as file:
        json.dump(data, file, indent=4)

    print(f'Data for {year} saved to {file_path}')


Data for 2008 saved to data/winter_2008-2009.json
Data for 2009 saved to data/winter_2009-2010.json
Data for 2010 saved to data/winter_2010-2011.json
Data for 2011 saved to data/winter_2011-2012.json
Data for 2012 saved to data/winter_2012-2013.json
Data for 2013 saved to data/winter_2013-2014.json
Data for 2014 saved to data/winter_2014-2015.json
Data for 2015 saved to data/winter_2015-2016.json
Data for 2016 saved to data/winter_2016-2017.json
Data for 2017 saved to data/winter_2017-2018.json
Data for 2018 saved to data/winter_2018-2019.json
Data for 2019 saved to data/winter_2019-2020.json
Data for 2020 saved to data/winter_2020-2021.json
Data for 2021 saved to data/winter_2021-2022.json
Data for 2022 saved to data/winter_2022-2023.json


In [10]:
import pandas as pd
import os
import json

# Initialize a DataFrame to store the aggregated data
aggregated_data = pd.DataFrame()

# Directory containing the JSON files
data_dir = 'data/'

# Process each JSON file and calculate the TAVG
for file_name in os.listdir(data_dir):
    if file_name.startswith('winter_') and file_name.endswith('.json'):
        file_path = os.path.join(data_dir, file_name)
        
        with open(file_path, 'r') as file:
            data = json.load(file)
            
            # Create a DataFrame from the current JSON file's data
            df = pd.DataFrame(data['results'])
            
            # Filter out records that don't have TMAX or TMIN as their datatype
            df_tmax = df[df['datatype'] == 'TMAX'].rename(columns={'value': 'TMAX'}).drop('datatype', axis=1)
            df_tmin = df[df['datatype'] == 'TMIN'].rename(columns={'value': 'TMIN'}).drop('datatype', axis=1)
            
            # Merge the TMAX and TMIN data on the date field
            df_merged = pd.merge(df_tmax, df_tmin, on='date')
            
            # Calculate TAVG
            df_merged['TAVG'] = (df_merged['TMAX'] + df_merged['TMIN']) / 2
            
            # Append the processed data to the aggregated DataFrame
            aggregated_data = pd.concat([aggregated_data, df_merged])

# Set the date as the index
aggregated_data['date'] = pd.to_datetime(aggregated_data['date'])
aggregated_data.set_index('date', inplace=True)

# Sort by date
aggregated_data.sort_index(inplace=True)

# Select only the required columns
aggregated_data = aggregated_data[['TMAX', 'TMIN', 'TAVG']]

# Save to CSV
csv_file_path = os.path.join(data_dir, 'all_data_max_min_avg.csv')
aggregated_data.to_csv(csv_file_path)

print(f'CSV file saved to {csv_file_path}')


CSV file saved to data/all_data_max_min_avg.csv


Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


In [22]:
import pandas as pd
import os
import json
from datetime import datetime

# Directory containing the JSON files
data_dir = 'data/'  # Modify this to your local data directory

# Initialize a DataFrame to store the compiled data
compiled_data = pd.DataFrame()

# Loop over each year and process the corresponding JSON file
for year in range(2008, 2023):
    year_range = f'{year}-{year+1}'
    json_file = f'winter_{year_range}.json'
    json_path = os.path.join(data_dir, json_file)

    if os.path.exists(json_path):
        with open(json_path, 'r') as file:
            data = json.load(file)['results']
            
            # Temporary storage for TMAX and TMIN values
            tmax_values = {}
            tmin_values = {}

            # Extract and process TAVG for each date
            for record in data:
                
                date_str = datetime.strptime(record['date'], '%Y-%m-%dT%H:%M:%S').strftime('%m-%d')
                if record['datatype'] == 'TMAX':
                    tmax_values[date_str] = record['value']
                elif record['datatype'] == 'TMIN':
                    tmin_values[date_str] = record['value']

            # Calculate TAVG for each date
            for date_str in tmax_values:
                if date_str in tmin_values:
                    tavg = (tmax_values[date_str] + tmin_values[date_str]) / 2
                    # print(tavg,date_str, year_range )
                    compiled_data.at[date_str, year_range] = tavg
            print(compiled_data)
# Save the compiled data to a CSV file
# csv_file_path = os.path.join(data_dir, 'all_data_min.csv')
# compiled_data.to_csv(csv_file_path, index=True)

# print(f"CSV file saved to {csv_file_path}")


       2008-2009
12-15       -8.5
12-16       13.0
12-17       25.5
12-18       16.5
12-19       29.0
12-20       15.5
12-21        9.5
12-22       15.0
12-23       19.0
12-24       27.0
12-25       30.0
12-26       30.0
12-27       22.5
12-28       41.5
12-29       43.5
12-30       39.5
12-31       35.5
01-01       45.0
01-02       47.5
01-03       30.5
01-04       17.0
01-05       27.0
01-06       33.0
01-07       41.0
01-08       46.5
01-09       31.5
01-10       32.0
01-11       32.5
01-12       26.0
01-13       31.5
01-14       35.0
01-15       35.0
01-16       46.0
01-17       40.5
01-18       46.5
01-19       48.5
01-20       48.5
01-21       56.0
       2008-2009  2009-2010
12-15       -8.5       37.5
12-16       13.0       42.5
12-17       25.5       38.5
12-18       16.5       30.0
12-19       29.0       32.5
12-20       15.5       38.0
12-21        9.5       45.5
12-22       15.0       36.0
12-23       19.0       20.0
12-24       27.0       12.0
12-25       30.0       11.5
1