## Project Title  
**Airbnb Top 10 Listings by Country**

---

## Objective  
Identify the **top 10** listings for each country using a combined score derived from listing rating and number of reviews.

***

## Dataset  
- Source file: `airbnb_v2.csv` with 12,805 rows and 23 columns.
- Core columns used: `id`, `name`, `rating`, `reviews`, `country`, `price`, `bathrooms`, `bedrooms`, `beds`, `guests`, `features`, `amenities`.

***

## Data Cleaning  
- Remove unused columns to simplify the dataset.
- Convert `rating`, `reviews` to numeric.
- Strip extra whitespace to standardize categories.

***

## Feature Engineering  
- Create a composite metric:  
  - `ratingandreviewsfilter = rating * reviews`  
  - This captures both listing quality (rating) and popularity (review volume) in a single score.

***

## Analysis Logic and Outputs  
- For each country in the dataset:  
  - Filter listings by `country`.  
  - Sort in descending order by `ratingandreviewsfilter`.  
  - Select the top 10 listings (`head(10)`).
- Export results:  
  - Save each country's top 10 listings to `csvs/<country>.csv`.
  - Archive all CSVs into `csvs.zip` for easy download and sharing.

  ## Sample Data


```markdown
| id       | name                                 | country | rating | reviews | price | bathrooms | bedrooms | beds | guests | ratingandreviewsfilter |
|----------|---------------------------------------|---------|--------|---------|-------|-----------|----------|------|--------|------------------------|
| 49849504 | Perla bungalov                       | Turkey  | 4.71   | 64      | 8078  | 1         | 2        | 1    | 2      | 301.44                 |
| 49871422 | Sapanca Breathable Bungalow          | Turkey  | 5.00   | 13      | 11339 | 1         | 1        | 2    | 4      | 65.00                  |
| 51245886 | Bungalov Ev 2                        | Turkey  | 0.00   | 0       | 6673  | 1         | 1        | 1    | 2      | 0.00                   |
| 48650769 | CasaMia White Suite Treehouse        | Turkey  | 0.00   | 0       | 14729 | 1         | 1        | 2    | 2      | 0.00                   |
| 50765985 | Ladin Bungalow                       | Turkey  | 0.00   | 0       | 12312 | 1         | 1        | 1    | 2      | 0.00                   |
| 4047216  | Lavender House                       | Turkey  | 0.00   | 0       | 13655 | 1         | 1        | 2    | 8      | 0.00                   |
| 53192531 | New Chalets on Farm with Fireplace 6 | Turkey  | 4.93   | 15      | 12845 | 1         | 2        | 3    | 6      | 73.95                  |
| 53151582 | Sapancaguldibibugalov                | Turkey  | 4.96   | 26      | 8128  | 1         | 1        | 1    | 4      | 128.96                 |
| 48255254 | VAD BUNGALOV SAPANCA                 | Turkey  | 4.95   | 20      | 11289 | 1         | 1        | 2    | 4      | 99.00                  |
| 62024530 | 9297447077 Villa Tuba                | Turkey  | 5.00   | 8       | 22758 | 1         | 2        | 4    | 6      | 40.00                  |
```

### 1) Exploratory Data Anaysis

In [None]:
import pandas as pd

In [None]:
df=pd.read_csv('/content/airbnb_v2.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,id,name,rating,reviews,host_name,host_id,address,features,amenities,...,price,country,bathrooms,beds,guests,toiles,bedrooms,studios,checkin,checkout
0,0,49849504,Perla bungalov,4.71,64,Mehmetcan,357334205.0,"Kartepe, Kocaeli, Turkey","2 guests,2 bedrooms,1 bed,1 bathroom","Mountain view,Valley view,Lake access,Kitchen,...",...,8078,Turkey,1,1,2,0,2,0,Flexible,12 00 pm
1,1,50891766,Authentic Beach Architect Sheltered Villa with...,New,0,Fatih,386223873.0,"Kaş, Antalya, Turkey","4 guests,2 bedrooms,2 beds,2 bathrooms","Kitchen,Wifi,Dedicated workspace,Free parking ...",...,4665,Turkey,2,2,4,0,2,0,4 00 pm - 11 00 pm,10 00 am
2,2,50699164,cottages sataplia,4.85,68,Giorgi,409690853.0,"Imereti, Georgia","4 guests,1 bedroom,3 beds,1 bathroom","Mountain view,Kitchen,Wifi,Dedicated workspace...",...,5991,Georgia,1,3,4,0,1,0,After 1 00 pm,12 00 pm
3,3,49871422,Sapanca Breathable Bungalow,5.0,13,Melih,401873242.0,"Sapanca, Sakarya, Turkey","4 guests,1 bedroom,2 beds,1 bathroom","Mountain view,Valley view,Kitchen,Wifi,Free pa...",...,11339,Turkey,1,2,4,0,1,0,After 2 00 pm,12 00 pm
4,4,51245886,Bungalov Ev 2,New,0,Arp Sapanca,414884116.0,"Sapanca, Sakarya, Turkey","2 guests,1 bedroom,1 bed,1 bathroom","Kitchen,Wifi,Free parking on premises,TV,Air c...",...,6673,Turkey,1,1,2,0,1,0,After 2 00 pm,12 00 pm


#### 1.1) Check basic details

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12805 entries, 0 to 12804
Data columns (total 23 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Unnamed: 0    12805 non-null  int64  
 1   id            12805 non-null  int64  
 2   name          12805 non-null  object 
 3   rating        12805 non-null  object 
 4   reviews       12805 non-null  object 
 5   host_name     12797 non-null  object 
 6   host_id       12805 non-null  float64
 7   address       12805 non-null  object 
 8   features      12805 non-null  object 
 9   amenities     12805 non-null  object 
 10  safety_rules  12805 non-null  object 
 11  hourse_rules  12805 non-null  object 
 12  img_links     12805 non-null  object 
 13  price         12805 non-null  int64  
 14  country       12805 non-null  object 
 15  bathrooms     12805 non-null  int64  
 16  beds          12805 non-null  int64  
 17  guests        12805 non-null  int64  
 18  toiles        12805 non-nu

#### 1.2) Check Null Values

In [None]:
df.isnull().sum()

Unnamed: 0,0
Unnamed: 0,0
id,0
name,0
rating,0
reviews,0
host_name,8
host_id,0
address,0
features,0
amenities,0


#### 1.3) Check Duplicates

In [None]:
print(len(df['id'].unique())-df['id'].nunique())

0


#### 1.4) Removing the Unnecessary Columns

In [None]:
del df['Unnamed: 0']
del df['host_name']
del df['host_id']
del df['img_links']
del df['checkin']
del df['checkout']

### 2) Data Cleaning

#### 2.1) Explore Necessary Columns

In [None]:
df['reviews'].unique()

array(['64', '0', '68', '13', '3', '77', '55', '116', '23', '21', '8',
       '52', '17', '61', '49', '38', '4', '15', '26', '5', '2', '85', '9',
       '20', '11', '7', '1', '241', '41', '10', '25', '154', '44', '19',
       '58', '137', '33', '12', '18', '16', '113', '125', '42', '244',
       '27', '188', '34', '69', '97', '6', '86', '114', '65', '91', '32',
       '53', '231', '99', '14', '120', '29', '28', '22', '123', '90',
       '124', '110', '79', '93', '54', '59', '74', '31', '48', '30', '89',
       '92', '35', '24', '229', '40', '102', '39', '194', '100', '147',
       '37', '78', '88', '71', '130', '576', '1,003', '261', '726', '127',
       '76', '84', '36', '60', '192', '46', '135', '62', '159', '158',
       '45', '95', '104', '211', '43', '121', '109', '210', '107', '98',
       '310', '96', '73', '50', '82', '66', '943', '254', '571', '72',
       '51', '246', '136', '165', '67', '807', '262', '548', '266', '181',
       '141', '171', '271', '160', '56', '608', '346',

In [None]:
df['rating'].unique()

array(['4.71', 'New', '4.85', '5.0', '4.67', '4.97', '4.89', '4.83',
       '4.87', '4.9', '4.75', '4.76', '4.84', '4.79', '4.82', '4.93',
       '4.96', '4.6', '4.64', '4.68', '4.8', '4.69', '4.58', '4.63',
       '4.91', '4.88', '4.94', '4.92', '4.86', '4.78', '4.57', '4.99',
       '4.98', '4.74', '4.7', '4.77', '4.72', '4.73', '4.95', '4.81',
       '4.53', '4.59', '4.62', '4.61', '4.51', '4.54', '4.56', '4.65',
       '4.55', '4.66', '4.52', '4.5', '4.33', '4.26', '3.33', '4.2',
       '4.0', '4.48', '4.38', '3.0', '4.49', '4.28', '4.45', '4.25',
       '4.36', '4.46', '4.44', '4.43', '3.75', '3.25', '4.13', '4.15',
       '3.88', '4.29', '4.4', '3.5', '4.31'], dtype=object)

In [None]:
df['country'].unique()

array([' Turkey', ' Georgia', ' Vietnam', ' Thailand', ' South Korea',
       ' India', ' Philippines', ' Japan', ' Lebanon', ' Taiwan',
       ' Israel', 'Turkey', ' Armenia', ' Cyprus', ' Lithuania',
       ' Slovakia', ' Denmark', ' Germany', ' Indonesia', ' Poland',
       ' Romania', ' Greece', ' Ukraine', ' Hungary', ' Albania',
       ' Bulgaria', ' Malaysia', ' Montenegro', ' Slovenia', ' Czechia',
       ' Sweden', ' Austria', ' Croatia', ' Tanzania', ' Italy',
       ' Sri Lanka', 'Philippines', ' Bosnia & Herzegovina', 'Montenegro',
       ' Kenya', ' Serbia', ' Seychelles', ' Finland', ' Norway',
       ' Iceland', ' Greenland', ' United States', ' Canada',
       ' Svalbard & Jan Mayen', 'Iceland', ' France', ' Australia',
       ' Morocco', ' Egypt', ' South Africa', ' Spain',
       ' United Arab Emirates', ' United Kingdom', ' Pakistan',
       'Thailand', ' Nepal', 'Sri Lanka', ' Singapore', ' Cambodia',
       ' Azerbaijan', ' Estonia', ' Latvia', ' Costa Rica',
     

#### 2.2) Clean and Typecast Rating column

In [None]:
df['rating']=df['rating'].str.replace('New','0')
df['rating']=df['rating'].astype('float')


#### 2.3) Clean and Typecast Reviews column

In [None]:
df['reviews']=df['reviews'].str.replace(',','').astype('int')

#### 2.3) Clean Country column

In [None]:
df['country']=df['country'].str.strip()



### 3) Analyse The Data

#### 3.1) Create a new column rating_and_reviews_filter



In [None]:
df['rating_and_reviews_filter']=df['rating']*df['reviews']

#### 3.2) Find Top 10 Based on Ratings and Reviews For Each Country

In [None]:
!mkdir csvs

In [None]:
from tqdm.auto import tqdm

countries=df['country'].unique()

for country in tqdm(countries):
  data=df.copy()

  data=data[data['country']==country]

  data.sort_values(by='rating_and_reviews_filter', ascending=False).head(10).to_csv('csvs/'+country+'.csv')

  0%|          | 0/119 [00:00<?, ?it/s]

In [None]:
# Zip the result into a folder
import shutil

shutil.make_archive('csvs', 'zip', 'csvs')

'/content/csvs.zip'