## Data With City and SubRegions

In [27]:
import json
import csv
from tqdm import tqdm

#### Loading File:

In [28]:
file_path = "data.json"
with open(file_path, "r") as file:
    data = json.load(file)
#print(data)

This code reads the contents of the file named data.json, parses it as JSON, and stores the resulting Python object in the variable data.

In [29]:
def csv_writer(data, method, save_as, headers=None):
    """Write CSV by inputing data and path"""

    file_path = save_as

    with open(file_path, method, encoding='utf8') as csvfile:
        if headers == None:
            try:
                headers = data[0].keys()
            except:
                headers = data.keys()
        writer = csv.DictWriter(
            csvfile, lineterminator='\n', fieldnames=headers)
        if method != 'a':
            writer.writeheader()
        try:
            [writer.writerow(d) for d in data]
        except:
            writer.writerow(data)

This function, csv_writer, takes in data (usually in the form of dictionaries), a method ('a' for append or other values for write), a filename (save_as), and optional headers, then writes or appends the data to a CSV file:

* file_path: The file path for the CSV is derived from the save_as parameter.
The file is opened in the specified mode (method).
* If headers aren't provided, the function tries to infer them from the first item of the data (assuming it's a list of dictionaries) or from the data itself (assuming it's a dictionary).
* A DictWriter object from the csv module is used to write the data to the CSV. This object allows for writing dictionaries directly to a CSV file.
* If the method isn't 'a' (append), the headers are written to the CSV.
* The function then tries to write each dictionary in the data to the CSV. If this fails (indicating that the data is not a list of dictionaries but perhaps a single dictionary), it writes the single dictionary directly.

this function provides a flexible way to write dictionaries to a CSV, either as a new file, appended data, or with specified headers.

In [13]:
cities = ['Ahmedabad',
 'Bangalore',
 'Chandigarh',
 'Chennai',
 'Coimbatore',
 'Dehradun',
 'Delhi',
 'Gadwal',
 'Gurgaon',
 'Guwahati',
 'Hyderabad',
 'Indore',
 'Jaipur',
 'Jalandhar',
 'Kanpur',
 'Kochi',
 'Kolkata',
 'Lucknow',
 'Ludhiana',
 'Mumbai',
 'Mysore',
 'Nagpur',
 'Nashik',
 'Noida',
 'Pune',
 'Surat',
 'Vadodara',
 'Vijayawada',
 'Vizag']

In [14]:
filename = 'Swiggy_Data_With_City_SubRegions.csv'
with open(filename, 'w') as f:
    wr = csv.writer(f)
    wr.writerow(['city','sub_region','resturant_name','rating','rating_count','cost','cuisine'])

for city in tqdm(cities):
    sub_cities = list(data[city].keys())[:-1]
    for sub_city in sub_cities:
        if sub_city !='link':
            resturant_ids = list(data[city][sub_city]['restaurants'].keys())
            for resutrant in resturant_ids:
                try:
                    city = city
                except:
                    city=" "
                try:
                    sub_city = sub_city
                except:
                    sub_city=" "
                try:
                    name = data[city][sub_city]['restaurants'][resutrant]['name']
                except:
                    name=" "
                try:
                    rating = data[city][sub_city]['restaurants'][resutrant]['rating']
                except:
                    rating=" "

                try:
                    rating_count = data[city][sub_city]['restaurants'][resutrant]['rating_count']
                except:
                    rating=" "
                
                    
                try:
                    cost = data[city][sub_city]['restaurants'][resutrant]['cost']
                except:
                    cost = " "
                try:
                    cuisine = data[city][sub_city]['restaurants'][resutrant]['cuisine']
                except:
                    cuisine=" "

                
                with open(filename, 'a') as f:
                    wr = csv.writer(f)
                    wr.writerow([city,sub_city,name,rating,rating_count,cost,cuisine])

100%|█████████████████████████████████████████████████████████████████| 30/30 [00:22<00:00,  1.34it/s]


This code snippet is designed to extract restaurant data from a nested dictionary structure (data) and write it to a CSV file named 'Swiggy_Data_With_City_SubRegions.csv'.

* A new CSV file, 'Swiggy_Data_With_City_SubRegions.csv', is initialized with headers: city, sub_region, resturant_name, rating, rating_count, cost, and cuisine.
* For each city in the provided cities list:
* 
   It extracts the sub-regions (or sub-cities) from the data dictionary. It avoids the last key and any key named 'link'.
  
  For each sub_city in the list of sub-regions:
  It retrieves the list of restaurant IDs.
  
  For each restaurant:
  It extracts various details such as name, rating, rating count, cost, and cuisine. If any of these details are missing or cause an error, it assigns a blank space as the value.
  These details are then appended to the CSV file.
 * The code utilizes the tqdm module to show a progress bar while processing the cities.
     

## Only with Sub Region Name

In [15]:
data['Abohar'].keys()

dict_keys(['link', 'restaurants'])

When you call data['Abohar'].keys(), you're asking for a list (or view) of all the top-level attributes or keys available for the 'Abohar' entry in the data dictionary.

In [16]:
filename = 'Swiggy_Data_With_SubRegions.csv'
with open(filename, 'w') as f:
    wr = csv.writer(f)
    wr.writerow(['city','sub_region','resturant_name','rating','rating_count','cost','cuisine'])
    
for sub_city in data.keys():
    try:
        for resutrant in data[sub_city]['restaurants'].keys(): 
                try:
                    city = "NA"
                except:
                    city = " "
                try:
                    sub_city = sub_city
                except:
                    sub_city=" "
                try:
                    name = data[sub_city]['restaurants'][resutrant]['name']
                except:
                    name=" "
                try:
                    rating = data[sub_city]['restaurants'][resutrant]['rating']
                except:
                    rating=" "
                try:
                    rating_count = data[sub_city]['restaurants'][resutrant]['rating']
                except:
                    rating_count=" "
            
                try:
                    cost = data[sub_city]['restaurants'][resutrant]['cost']
                except:
                    cost = " "
                try:
                    cuisine = data[sub_city]['restaurants'][resutrant]['cuisine']
                except:
                    cuisine=" "
                with open(filename, 'a') as f:
                    wr = csv.writer(f)
                    wr.writerow([city,sub_city,name,rating,rating_count,cost,cuisine])
    except:
        pass

This code snippet aims to extract restaurant data from the data dictionary and save it into a CSV file named 'Swiggy_Data_With_SubRegions.csv'.
* A new CSV file, 'Swiggy_Data_With_SubRegions.csv', is created with the headers: city, sub_region, resturant_name, rating, rating_count, cost, and cuisine.
* For each sub_city (or sub-region) in the keys of the data dictionary:
   It attempts to iterate over each restaurant in the sub_city's 'restaurants' sub-dictionary.

  For each restaurant:

  The city is set to "NA" (since there's no city information provided in this structure).
        It extracts various details from the data dictionary such as the restaurant's name, rating, rating count, cost, and cuisine. If any of these details are missing or cause an error, it assigns a default value or a blank space.
         These details are then appended to the previously created CSV file.

* If any error occurs during the processing of a particular sub_city, the code will skip that entry and move on to the next one, try-except block with the pass statement.


## Resturants present in Links:

In [17]:
filename = 'Swiggy_Data_With_Links.csv'
with open(filename, 'w') as f:
    wr = csv.writer(f)
    wr.writerow(['city','sub_region','resturant_name','rating','rating_count','cost','cuisine'])

erros =[]

for city in tqdm(cities):
    try:
        resturant_ids = list(data[city]['link']['restaurants'].keys())
        for resutrant in resturant_ids:
            try:
                city = city
            except:
                city=" "
            try:
                sub_city = 'link'
            except:
                sub_city=" "
            try:
                name = data[city]['link']['restaurants'][resutrant]['name']
            except:
                name=" "
            try:
                rating = data[city]['link']['restaurants'][resutrant]['rating']
            except:
                rating=" "
                
            try:
                rating_count = data[city]['link']['restaurants'][resutrant]['rating']
            except:
                rating_count=" "

            
            try:
                cost = data[city]['link']['restaurants'][resutrant]['price']
            except:
                cost = " "
            try:
                cuisine = data[city]['link']['restaurants'][resutrant]['type']
            except:
                cuisine=" "

            
            with open(filename, 'a') as f:
                wr = csv.writer(f)
                wr.writerow([city,sub_city,name,rating,rating_count,cost,cuisine])
    except:
        erros.append(city)
        pass

100%|█████████████████████████████████████████████████████████████████| 30/30 [00:03<00:00,  9.96it/s]


This code snippet aims to extract restaurant data specifically from the 'link' sub-region of each city in the data dictionary and save it into a CSV file named 'Swiggy_Data_With_Links.csv'.

* A new CSV file, 'Swiggy_Data_With_Links.csv', is created with headers: city, sub_region, resturant_name, rating, rating_count, cost, and cuisine.
* An empty list named erros (likely a type for "errors") is initialized to capture cities that cause exceptions during processing.
*  For each city in the provided cities list:
          The code attempts to retrieve the list of restaurant IDs from the 'link' sub-region of that city in the data dictionary.
         For each restaurant:The city remains unchanged as it's currently being iterated upon.
         The sub_city is set to 'link'.
         It extracts various details from the data dictionary, such as the restaurant's name, rating, rating count, cost (denoted as 'price' here), and cuisine (denoted as 'type' here). If any of these details are missing or cause an error, it assigns a default value or a blank space.
         These details are appended to the CSV file.
   
* If any error occurs during the processing of a particular city, the city's name is added to the erros list, and the code moves on to the next city. 


### Patna Data

## Final CSV

In [18]:
import pandas as pd
df_1 = pd.read_csv("Swiggy_Data_With_City_SubRegions.csv")
df_2 = pd.read_csv("Swiggy_Data_With_Links.csv")
df_3 = pd.read_csv("Swiggy_Data_With_SubRegions.csv")

In [19]:
final_df = pd.concat([df_1,df_2,df_3])

In [20]:
final_df.head()

Unnamed: 0,city,sub_region,resturant_name,rating,rating_count,cost,cuisine
0,Ahmedabad,Vastrapur,M.A.D By Tomato'S,4.3,100+ ratings,₹ 1200,"Indian,Chinese"
1,Ahmedabad,Vastrapur,Tea Post,4.0,100+ ratings,₹ 150,Fast Food
2,Ahmedabad,Vastrapur,Shanghai Chicken Lolipops,--,Too Few Ratings,₹ 300,"Chinese,Fast Food"
3,Ahmedabad,Vastrapur,Ministry Of Momos,--,Too Few Ratings,₹ 300,Chinese
4,Ahmedabad,Vastrapur,Sizzling - The Cake Room,--,Too Few Ratings,₹ 350,Desserts


This code combines data from three different CSV files into a single DataFrame and then previews the top five rows of the merged data.

In [21]:
final_df.to_csv("SwiggyCleanedData.csv",index=False)

This line exports the combined data in the final_df DataFrame to a new CSV file.

In [22]:
final_df.shape

(181404, 7)

 final_df DataFrame has 181404 rows and 7 columns.

In [23]:
df_1.shape

(110440, 7)

In [24]:
df_2.shape

(17951, 7)

In [25]:
df_3.shape

(53013, 7)