<a href="https://colab.research.google.com/github/anirbanghoshsbi/.github.io/blob/master/work/temp_hosting/Untitled113.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [153]:
import pandas as pd
import yfinance as yf
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler


In [156]:

# Download data using yfinance
def get_data():
    data = pd.read_csv('nifty-500.csv')
    data = data[~data['NSE Code'].str.contains('MCX')]
    data = data[~data['NSE Code'].str.contains('KSB')]
    data = data[~data['NSE Code'].str.contains('BHARATFORG')]

    return data

data=get_data()

In [157]:
data.shape

(498, 21)

In [158]:
features = [col for col in data.columns.tolist() if col not in ("BSE Code", "NSE Code",'Name','Industry','Current Price','Market Capitalization','Operating profit','asset turnover ratio', 'Price to Free Cash Flow', 'Promoter holding',
       'Change in promoter holding')]


In [159]:
data.columns

Index(['Name', 'BSE Code', 'NSE Code', 'Industry', 'Current Price',
       'Return on equity', 'Price to Earning', 'Price to Sales',
       'Debt to equity', 'Market Capitalization', 'Price to book value',
       'Return on capital employed', 'OPM', 'Current ratio',
       'Operating profit', 'Sales growth', 'Asset Turnover Ratio',
       'asset turnover ratio', 'Price to Free Cash Flow', 'Promoter holding',
       'Change in promoter holding'],
      dtype='object')

In [160]:
scaled_data=data[features].fillna(0.001).values

In [164]:
# Extract the 'Cumulative Returns' column for clustering
X = scaled_data

# Perform KMeans clustering with 4 clusters
kmeans = KMeans(n_clusters=4, random_state=42)
data['Cluster'] = kmeans.fit_predict(X)



In [167]:
# Analyze and interpret clusters
for i in range(4):
    cluster_data = data[data['Cluster'] == i]
    print(f"Cluster {i}:\n", cluster_data[features].median())
    print(f"Cluster_shape {i}:\n", cluster_data.shape)


Cluster 0:
 Return on equity              14.450
Price to Earning              24.955
Price to Sales                 2.470
Debt to equity                 0.300
Price to book value            3.565
Return on capital employed    14.080
OPM                           18.330
Current ratio                  1.745
Sales growth                   9.830
Asset Turnover Ratio           1.040
dtype: float64
Cluster_shape 0:
 (319, 22)
Cluster 1:
 Return on equity                -3.140
Price to Earning              1330.460
Price to Sales                   4.115
Debt to equity                   0.870
Price to book value             18.400
Return on capital employed       3.190
OPM                              6.135
Current ratio                    1.070
Sales growth                    16.140
Asset Turnover Ratio             1.790
dtype: float64
Cluster_shape 1:
 (2, 22)
Cluster 2:
 Return on equity                5.44
Price to Earning              257.48
Price to Sales                  5.77
Debt to e

#cluster 0 banks and Financial services
#cluster 1 pharmaceuticals
#cluster 2 life insurance
#cluster 6 is a mix of sectors

In [172]:
data[data['Cluster']==0].to_csv('well_established_companies.csv')

Based on the output, here are some potential insights we can derive from the clustering analysis:

1. Cluster Characteristics:
   - Cluster 0 seems to represent companies with relatively healthy financial ratios - high return on equity (14.45%), reasonable price-to-earnings (24.955), low debt-to-equity (0.3), and good operating profitability (OPM 18.33%). This cluster has the largest number of companies (319).
   - Cluster 1 appears to be an outlier cluster with only 2 companies. These companies have a very high price-to-earnings ratio (1330.46), indicating they may be overvalued or have high growth expectations priced in.
   - Cluster 2 contains 11 companies with poor profitability metrics - low return on equity (5.44%), high debt-to-equity (0.42), and low operating profitability (OPM 10.69%).
   - Cluster 3 represents 166 companies with strong profitability and growth metrics - high return on equity (18.11%), high return on capital employed (19.935%), and good sales growth (13.29%).

2. Investment Perspectives:
   - Cluster 0 and Cluster 3 could be attractive for value investors, as they contain companies with strong fundamentals and reasonable valuations.
   - Cluster 1 may interest growth investors willing to pay a premium for high-growth companies, but the high valuations also carry significant risk.
   - Cluster 2 may be less appealing for investment, as the companies exhibit weaker profitability and higher debt levels.

3. Industry/Sector Analysis:
   - By analyzing which industries or sectors the companies in each cluster belong to, we may be able to identify potential industry trends or characteristics that contribute to the clustering of financial ratios.

4. Peer Comparison:
   - Within each cluster, companies can be compared to their peers based on the similarities in their financial metrics. This could help identify potential outperformers or underperformers within each group.

Overall, this clustering analysis provides a high-level segmentation of companies based on their financial characteristics, which can be useful for investment decision-making, industry analysis, and peer comparisons. However, further analysis and due diligence would be required to make informed investment decisions.(claude ai)

Absolutely! Here's a breakdown of the cluster profiles, leaving out the smaller, potentially outlier-driven clusters:

**Cluster Profiles**

* **Cluster 0 (Largest Cluster):**  Healthy profitability (ROE), moderate valuations, low-to-moderate debt, decent growth, and asset efficiency. This seems like a cluster of solid, balanced companies.
* **Cluster 1 (Negative Profitability):** Companies facing challenges, with negative profitability, extremely high valuations (especially P/E), and moderate debt. This might indicate companies in turnaround situations or those with very unique business models where standard valuations don't apply.
* **Cluster 2 (Low Profitability, High Valuations):**  Lower profitability, very high valuations, moderate debt, decent growth, and lower asset turnover. These companies might be speculative, where investors are betting on future potential despite lower current earnings.
* **Cluster 3 (Strong Performers):** High profitability, relatively high valuations, low debt, decent growth, and good asset turnover. This cluster likely contains strong performers with growth expectations priced into their valuations.

**Observations**

* **Profitability:** Your clusters primarily differ based on profitability. Cluster 0 and Cluster 3 showcase healthy profitability, while clusters 1 and 2 exhibit struggles.
* **Valuation:** Clusters 1 and 2 stand out with very high valuations, especially in relation to their profitability.  
* **Focus Points:** It's helpful that you likely removed some outliers. This allows us to focus more clearly on the larger patterns.

**Actions to Take**

1. **Industry Analysis:**  Knowing the industries these companies represent will add a significant layer to the analysis. Different industries have naturally distinct financial profiles.
2. **Growth Expectations:** Consider whether the high valuations in Clusters 1, 2, and 3 could be justified by above-average growth expectations. Analyze market forecasts or company projections for the industries involved in these clusters.
3. **Qualitative Factors:** Supplement this quantitative view with qualitative research. Look into the individual stories behind companies in clusters 1 and 2 to understand their high valuations despite weak profitability.

**Questions to Deepen Analysis**

* Do you know the industries represented in these clusters?
* Are you interested in exploring why Clusters 1 and 2 have high valuations despite weaker profitability?

Let me know, and we can delve into specific aspects further!
(gemini)