Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'home-depot-sentence-similarity.csv' is missing #982

Open
b4naki opened this issue Dec 16, 2022 · 6 comments
Open

'home-depot-sentence-similarity.csv' is missing #982

b4naki opened this issue Dec 16, 2022 · 6 comments

Comments

@b4naki
Copy link

b4naki commented Dec 16, 2022

in the sentence similarity project the path
var dataPath = Path.GetFullPath(@"..\..\..\..\Data\home-depot-sentence-similarity.csv");
does not exist.

@DarrenTweedale
Copy link

I think this file can be downloaded from https://www.kaggle.com/competitions/home-depot-product-search-relevance/data
download train.csv.zip, extract, and then rename the csv to home-depot-sentence-similarity.csv and place into the data folder

@b4naki
Copy link
Author

b4naki commented Dec 21, 2022

I think this file can be downloaded from https://www.kaggle.com/competitions/home-depot-product-search-relevance/data download train.csv.zip, extract, and then rename the csv to home-depot-sentence-similarity.csv and place into the data folder

Thank you this worked.

@Symbai
Copy link

Symbai commented Jun 11, 2023

Is there a way to download this without entering a phone number?!

@wushifeng
Copy link

maybe:
1 download data home-depot-product-search-relevance.zip from https://www.kaggle.com/competitions/home-depot-product-search-relevance/data
2 extract train.csv.zip and product_descriptions.csv.zip to Dir Data
3 use code below to generate home-depot-sentence-similarity.csv

using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms;

namespace SentenceSimilarity
{
    internal class GenData
    {
        //  id product_uid product_title search_term relevance
        //  2	100001	Simpson Strong-Tie 12-Gauge Angle   angle bracket	3
        public class HomeDepot
        {
            [LoadColumn(0)]
            public int id { get; set; }

            [LoadColumn(1)]
            public int product_uid { get; set; }

            [LoadColumn(2)]
            public string product_title { get; set; }

            [LoadColumn(3)]
            public string search_term { get; set; }

            [LoadColumn(4)]
            public string relevance { get; set; }
        }

        // https://learn.microsoft.com/en-us/dotnet/api/microsoft.ml.custommappingcatalog.custommapping?view=ml-dotnet
        [CustomMappingFactoryAttribute("product_description")]
        private class ProdDescCustomAction : CustomMappingFactory<HomeDepot,   CustomMappingOutput>
        {
            // We define the custom mapping between input and output rows that will
            // be applied by the transformation.
            public static void CustomAction(HomeDepot input, CustomMappingOutput
                output) => output.product_description = prodDesc[input.product_uid.ToString()];

            public override Action<HomeDepot, CustomMappingOutput> GetMapping()
                => CustomAction;
        }
        // Defines only the column to be generated by the custom mapping
        // transformation in addition to the columns already present.
        private class CustomMappingOutput
        {
            public string product_description { get; set; }
        }

        static Dictionary<string, string> prodDesc = new Dictionary<string, string>();

        static void Main(string[] args)
        {
            var mlContext = new MLContext(seed: 1);

            var DataPath = Path.GetFullPath(@"..\..\..\..\Data\product_descriptions.csv");
            {
                IDataView dv = mlContext.Data.LoadFromTextFile(DataPath, hasHeader: true, separatorChar: ',', allowQuoting: true,
                    columns: new[]      {
                        new TextLoader.Column("product_uid",DataKind.String,0),
                        new TextLoader.Column("product_description",DataKind.String,1)
                    }
                  );
                foreach (var row in dv.Preview(maxRows: 15_0000).RowView)
                {
                    string uid="", desc="";
                    foreach (KeyValuePair<string, object> column in row.Values)
                    {
                        if (column.Key == "product_uid")
                        {
                            uid = column.Value.ToString();
                        }
                        else
                        {
                            desc= column.Value.ToString();
                        }
                    }

                    prodDesc[uid] = desc;
                }
            }

            DataPath = Path.GetFullPath(@"..\..\..\..\Data\train.csv");
            IDataView dataView = mlContext.Data.LoadFromTextFile<HomeDepot>(DataPath, hasHeader: true, separatorChar: ',', allowQuoting: true);
            var preViewTransformedData = dataView.Preview(maxRows: 5);
            foreach (var row in preViewTransformedData.RowView)
            {
                var ColumnCollection = row.Values;
                string lineToPrint = "Row--> ";
                foreach (KeyValuePair<string, object> column in ColumnCollection)
                {
                    lineToPrint += $"| {column.Key}:{column.Value}";
                }
                Console.WriteLine(lineToPrint + "\n");
            }

            var pipeline = mlContext.Transforms.CustomMapping(new ProdDescCustomAction().GetMapping(), contractName:  "product_description");
            var transformedData = pipeline.Fit(dataView).Transform(dataView);

            //mlContext.ComponentCatalog.RegisterAssembly(typeof(IsUnderThirtyCustomAction).Assembly);
            Console.WriteLine("save file");
            using FileStream fs = new FileStream(Path.GetFullPath(@"..\..\..\..\Data\home-depot-sentence-similarity.csv"), FileMode.Create);
            mlContext.Data.SaveAsText(transformedData, fs, schema: false, separatorChar:',');
        }
    }
}

After these operation, you can see the data file home-depot-sentence-similarity.csv.

@Symbai
Copy link

Symbai commented Dec 18, 2023

maybe:
1 download data home-depot-product-search-relevance.zip from https://www.kaggle.com/competitions/home-depot-product-search-relevance/data

Reposting the link is not a help. The problem of phone number is required still exist. I cannot download it without logging in. I dont have a google account (creating one wants my phone number) same others. Even creating a Kaggle account is asking for my phone number.

@wushifeng
Copy link

Here is the processed data file.
home-depot-sentence-similarity.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants