Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
  • Loading branch information
timchen0907 committed Apr 22, 2021
1 parent 108a65e commit 18f1d47
Showing 1 changed file with 222 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,222 @@
{
"metadata": {
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
},
"orig_nbformat": 2,
"kernelspec": {
"name": "python377jvsc74a57bd0e1977659ddea367e51ae7d236f6dca82c64f40282afc6ae3d301bd694bd63df5",
"display_name": "Python 3.7.7 64-bit"
},
"metadata": {
"interpreter": {
"hash": "e1977659ddea367e51ae7d236f6dca82c64f40282afc6ae3d301bd694bd63df5"
}
}
},
"nbformat": 4,
"nbformat_minor": 2,
"cells": [
{
"source": [
"# `如何使用機器學習提高房仲業潛在成交率_系列1_資料處理基本心法`\n",
"## 作者:陳政廷、王裕萍、謝豐檍(臺灣行銷研究特邀作者)、鍾皓軒(臺灣行銷研究有限公司創辦人)"
],
"cell_type": "markdown",
"metadata": {}
},
{
"source": [
"## 原始資料請見[本連結](https://www.kaggle.com/c/two-sigma-connect-rental-listing-inquiries/data),下載下來後與本ipynb檔案放於同一個工作目錄中,再執行下方程式即可"
],
"cell_type": "markdown",
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"df_train=pd.read_json(open(r\"train.json\"))"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" bathrooms bedrooms building_id \\\n",
"4 1.0 1 8579a0b0d54db803821a35a4a615e97a \n",
"6 1.0 2 b8e75fc949a6cd8225b455648a951712 \n",
"9 1.0 2 cd759a988b8f23924b5a2058d5ab2b49 \n",
"10 1.5 3 53a5b119ba8f7b61d4e010512e0dfc85 \n",
"15 1.0 0 bfb9405149bfff42a92980b594c28234 \n",
"\n",
" created description \\\n",
"4 2016-06-16 05:55:27 Spacious 1 Bedroom 1 Bathroom in Williamsburg!... \n",
"6 2016-06-01 05:44:33 BRAND NEW GUT RENOVATED TRUE 2 BEDROOMFind you... \n",
"9 2016-06-14 15:19:59 **FLEX 2 BEDROOM WITH FULL PRESSURIZED WALL**L... \n",
"10 2016-06-24 07:54:24 A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ... \n",
"15 2016-06-28 03:50:23 Over-sized Studio w abundant closets. Availabl... \n",
"\n",
" display_address features \\\n",
"4 145 Borinquen Place [Dining Room, Pre-War, Laundry in Building, Di... \n",
"6 East 44th [Doorman, Elevator, Laundry in Building, Dishw... \n",
"9 East 56th Street [Doorman, Elevator, Laundry in Building, Laund... \n",
"10 Metropolitan Avenue [] \n",
"15 East 34th Street [Doorman, Elevator, Fitness Center, Laundry in... \n",
"\n",
" latitude listing_id longitude manager_id \\\n",
"4 40.7108 7170325 -73.9539 a10db4590843d78c784171a107bdacb4 \n",
"6 40.7513 7092344 -73.9722 955db33477af4f40004820b4aed804a0 \n",
"9 40.7575 7158677 -73.9625 c8b10a317b766204f08e613cef4ce7a0 \n",
"10 40.7145 7211212 -73.9425 5ba989232d0489da1b5f2c45f6688adc \n",
"15 40.7439 7225292 -73.9743 2c3b41f588fbb5234d8a1e885a436cfa \n",
"\n",
" photos price \\\n",
"4 [https://photos.renthop.com/2/7170325_3bb5ac84... 2400 \n",
"6 [https://photos.renthop.com/2/7092344_7663c19a... 3800 \n",
"9 [https://photos.renthop.com/2/7158677_c897a134... 3495 \n",
"10 [https://photos.renthop.com/2/7211212_1ed4542e... 3000 \n",
"15 [https://photos.renthop.com/2/7225292_901f1984... 2795 \n",
"\n",
" street_address interest_level \n",
"4 145 Borinquen Place medium \n",
"6 230 East 44th low \n",
"9 405 East 56th Street medium \n",
"10 792 Metropolitan Avenue medium \n",
"15 340 East 34th Street low "
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>bathrooms</th>\n <th>bedrooms</th>\n <th>building_id</th>\n <th>created</th>\n <th>description</th>\n <th>display_address</th>\n <th>features</th>\n <th>latitude</th>\n <th>listing_id</th>\n <th>longitude</th>\n <th>manager_id</th>\n <th>photos</th>\n <th>price</th>\n <th>street_address</th>\n <th>interest_level</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>4</th>\n <td>1.0</td>\n <td>1</td>\n <td>8579a0b0d54db803821a35a4a615e97a</td>\n <td>2016-06-16 05:55:27</td>\n <td>Spacious 1 Bedroom 1 Bathroom in Williamsburg!...</td>\n <td>145 Borinquen Place</td>\n <td>[Dining Room, Pre-War, Laundry in Building, Di...</td>\n <td>40.7108</td>\n <td>7170325</td>\n <td>-73.9539</td>\n <td>a10db4590843d78c784171a107bdacb4</td>\n <td>[https://photos.renthop.com/2/7170325_3bb5ac84...</td>\n <td>2400</td>\n <td>145 Borinquen Place</td>\n <td>medium</td>\n </tr>\n <tr>\n <th>6</th>\n <td>1.0</td>\n <td>2</td>\n <td>b8e75fc949a6cd8225b455648a951712</td>\n <td>2016-06-01 05:44:33</td>\n <td>BRAND NEW GUT RENOVATED TRUE 2 BEDROOMFind you...</td>\n <td>East 44th</td>\n <td>[Doorman, Elevator, Laundry in Building, Dishw...</td>\n <td>40.7513</td>\n <td>7092344</td>\n <td>-73.9722</td>\n <td>955db33477af4f40004820b4aed804a0</td>\n <td>[https://photos.renthop.com/2/7092344_7663c19a...</td>\n <td>3800</td>\n <td>230 East 44th</td>\n <td>low</td>\n </tr>\n <tr>\n <th>9</th>\n <td>1.0</td>\n <td>2</td>\n <td>cd759a988b8f23924b5a2058d5ab2b49</td>\n <td>2016-06-14 15:19:59</td>\n <td>**FLEX 2 BEDROOM WITH FULL PRESSURIZED WALL**L...</td>\n <td>East 56th Street</td>\n <td>[Doorman, Elevator, Laundry in Building, Laund...</td>\n <td>40.7575</td>\n <td>7158677</td>\n <td>-73.9625</td>\n <td>c8b10a317b766204f08e613cef4ce7a0</td>\n <td>[https://photos.renthop.com/2/7158677_c897a134...</td>\n <td>3495</td>\n <td>405 East 56th Street</td>\n <td>medium</td>\n </tr>\n <tr>\n <th>10</th>\n <td>1.5</td>\n <td>3</td>\n <td>53a5b119ba8f7b61d4e010512e0dfc85</td>\n <td>2016-06-24 07:54:24</td>\n <td>A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...</td>\n <td>Metropolitan Avenue</td>\n <td>[]</td>\n <td>40.7145</td>\n <td>7211212</td>\n <td>-73.9425</td>\n <td>5ba989232d0489da1b5f2c45f6688adc</td>\n <td>[https://photos.renthop.com/2/7211212_1ed4542e...</td>\n <td>3000</td>\n <td>792 Metropolitan Avenue</td>\n <td>medium</td>\n </tr>\n <tr>\n <th>15</th>\n <td>1.0</td>\n <td>0</td>\n <td>bfb9405149bfff42a92980b594c28234</td>\n <td>2016-06-28 03:50:23</td>\n <td>Over-sized Studio w abundant closets. Availabl...</td>\n <td>East 34th Street</td>\n <td>[Doorman, Elevator, Fitness Center, Laundry in...</td>\n <td>40.7439</td>\n <td>7225292</td>\n <td>-73.9743</td>\n <td>2c3b41f588fbb5234d8a1e885a436cfa</td>\n <td>[https://photos.renthop.com/2/7225292_901f1984...</td>\n <td>2795</td>\n <td>340 East 34th Street</td>\n <td>low</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 3
}
],
"source": [
"df_train.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"<class 'pandas.core.frame.DataFrame'>\nInt64Index: 49352 entries, 4 to 124009\nData columns (total 15 columns):\n # Column Non-Null Count Dtype \n--- ------ -------------- ----- \n 0 bathrooms 49352 non-null float64\n 1 bedrooms 49352 non-null int64 \n 2 building_id 49352 non-null object \n 3 created 49352 non-null object \n 4 description 49352 non-null object \n 5 display_address 49352 non-null object \n 6 features 49352 non-null object \n 7 latitude 49352 non-null float64\n 8 listing_id 49352 non-null int64 \n 9 longitude 49352 non-null float64\n 10 manager_id 49352 non-null object \n 11 photos 49352 non-null object \n 12 price 49352 non-null int64 \n 13 street_address 49352 non-null object \n 14 interest_level 49352 non-null object \ndtypes: float64(3), int64(3), object(9)\nmemory usage: 6.0+ MB\n"
]
}
],
"source": [
"df_train.info()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"df_train[\"num_features\"] = df_train[\"features\"].apply(len)\n",
"df_train[\"num_photos\"]=df_train[\"photos\"].apply(len)\n",
"df_train[\"num_description_words\"]=df_train[\"description\"].apply(lambda x:len(x.split(\" \")))\n",
"def caculate(n_words):\n",
" description_test=[]\n",
" for i in n_words:\n",
" description_test.append(len(i.split(\" \")))\n",
" return description_test\n",
"df_train[\"created\"]=pd.to_datetime(df_train[\"created\"])\n",
"df_train[\"created_year\"]=df_train[\"created\"].dt.year\n",
"df_train[\"created_month\"] = df_train[\"created\"].dt.month\n",
"df_train[\"created_day\"] = df_train[\"created\"].dt.day"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"choose_columns=[\"bathrooms\", \"bedrooms\", \"latitude\", \"longitude\", \"price\",\n",
" \"num_photos\", \"num_features\", \"num_description_words\",\n",
" \"created_year\", \"created_month\", \"created_day\",\"interest_level\"]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"df_train=df_train[choose_columns]"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" bathrooms bedrooms latitude longitude price num_photos num_features \\\n",
"4 1.0 1 40.7108 -73.9539 2400 12 7 \n",
"6 1.0 2 40.7513 -73.9722 3800 6 6 \n",
"9 1.0 2 40.7575 -73.9625 3495 6 6 \n",
"10 1.5 3 40.7145 -73.9425 3000 5 0 \n",
"15 1.0 0 40.7439 -73.9743 2795 4 4 \n",
"\n",
" num_description_words created_year created_month created_day \\\n",
"4 77 2016 6 16 \n",
"6 131 2016 6 1 \n",
"9 119 2016 6 14 \n",
"10 95 2016 6 24 \n",
"15 41 2016 6 28 \n",
"\n",
" interest_level \n",
"4 medium \n",
"6 low \n",
"9 medium \n",
"10 medium \n",
"15 low "
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>bathrooms</th>\n <th>bedrooms</th>\n <th>latitude</th>\n <th>longitude</th>\n <th>price</th>\n <th>num_photos</th>\n <th>num_features</th>\n <th>num_description_words</th>\n <th>created_year</th>\n <th>created_month</th>\n <th>created_day</th>\n <th>interest_level</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>4</th>\n <td>1.0</td>\n <td>1</td>\n <td>40.7108</td>\n <td>-73.9539</td>\n <td>2400</td>\n <td>12</td>\n <td>7</td>\n <td>77</td>\n <td>2016</td>\n <td>6</td>\n <td>16</td>\n <td>medium</td>\n </tr>\n <tr>\n <th>6</th>\n <td>1.0</td>\n <td>2</td>\n <td>40.7513</td>\n <td>-73.9722</td>\n <td>3800</td>\n <td>6</td>\n <td>6</td>\n <td>131</td>\n <td>2016</td>\n <td>6</td>\n <td>1</td>\n <td>low</td>\n </tr>\n <tr>\n <th>9</th>\n <td>1.0</td>\n <td>2</td>\n <td>40.7575</td>\n <td>-73.9625</td>\n <td>3495</td>\n <td>6</td>\n <td>6</td>\n <td>119</td>\n <td>2016</td>\n <td>6</td>\n <td>14</td>\n <td>medium</td>\n </tr>\n <tr>\n <th>10</th>\n <td>1.5</td>\n <td>3</td>\n <td>40.7145</td>\n <td>-73.9425</td>\n <td>3000</td>\n <td>5</td>\n <td>0</td>\n <td>95</td>\n <td>2016</td>\n <td>6</td>\n <td>24</td>\n <td>medium</td>\n </tr>\n <tr>\n <th>15</th>\n <td>1.0</td>\n <td>0</td>\n <td>40.7439</td>\n <td>-73.9743</td>\n <td>2795</td>\n <td>4</td>\n <td>4</td>\n <td>41</td>\n <td>2016</td>\n <td>6</td>\n <td>28</td>\n <td>low</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 8
}
],
"source": [
"df_train.head()"
]
}
]
}

0 comments on commit 18f1d47

Please sign in to comment.